简体   繁体   中英

How can I send a Java string with Unicode characters to C++ via socket, without strange characters?

I am working on an application for my android phone so I can import sms, read them and reply sms. Everything worked just as I planned when I was programming the server and the client. If i had any problems, Google search gave me solutions, but this time, for the first time of my life, I am asking you for help.

The Problem:

The problem is, when the client(Java) sends SMS content that contains unicode characters such as "å, ä, ö" , c++ cannot read them.

My program works that it sends the packet size first to make the other aware of how big the packet is that will come. So eg Java calculates the packet will be 121 bytes and sends it to the server. But if the packet contains few non ANSI characters, c++ will not receive 121 bytes, but 123 bytes, and the non-ANSI chars will become strange.

I've been Googling all day without answers. I've tried wchar_t in c++, I've tried to set everything in Java to be sent using UTF-8, I've been debugging for hours to recreate the problem and try different things, but without success!

So what is going on here? How can i get the text from Java to C++ in the correct size and representation like in Java? Packets without Unicode chars works fine.

Thank you guys! A little tired atm, hope I didn't miss anything. The code could be a little messy, it is only a prototype yet.

P:S, This is a TCP conenction.

-Server C++ recv Function-

bool Receive( std::string& msg)
{
    zReadMutex.lock();

    try
    {
        int errCode;
        unsigned int packetSize = 0;
        char packetSizeBuffer[4];

        //Get packet size
        errCode = recv(zSocket, packetSizeBuffer, sizeof(packetSizeBuffer), 0);

        if ( errCode == SOCKET_ERROR || errCode == 0)
        {
            throw NetworkException("Failed Receiving Packet Size!", WSAGetLastError());
        }

        //Convert
        packetSize = CharArrayToUnsignedInt(packetSizeBuffer);

        if (packetSize == 0)
        {
            throw NetworkException("Connection Closed!");
        }

        //Calculate chunks

        //Total bits received
        unsigned int totalBits = 0;
        //Calculate number of chunks that will arrive
        int chunks = CaculateChunks(packetSize);
        //Counter for the chunk loop
        int count = 0;
        //Add to message for every chunk received
        std::string message = "";

        //Just a temp check
        if (chunks > 15)
        {
            throw NetworkException("Connection Closed!");
        }

        //Get Chunks
        while (count < chunks)
        {
            char* buffer = new char[zMaxChunkSize];

            if ((errCode = recv(zSocket, buffer, zMaxChunkSize, 0)) <= 0)
            {
                if (errCode < 0)
                {
                    delete [] buffer;
                    throw NetworkException("Failed Receiving Packet Data!", WSAGetLastError());
                }
                else
                {
                    delete [] buffer;
                    throw NetworkException("Connection Closed!");
                }

            }

            totalBits += errCode;
            count++;
            message += buffer;

            delete [] buffer;

        }

        if (packetSize != totalBits)
        {
            throw NetworkException("Message is not expected size!");
        }

        message.resize(totalBits);
        msg = std::string(message);

    }
    catch(...)
    {
        zReadMutex.unlock();
        throw;
    }

    zReadMutex.unlock();
    return true;
}

- Client Java Send Function -

public boolean InitSender()
{
    if(mSocket == null)
        return false;

    try {
        //Auto flush is false, but it auto flush anyways
        out = new PrintStream(mSocket.getOutputStream(), false, "UTF-8");

    } catch (IOException e) {
        e.printStackTrace();
        return false;
    }

    return true;
}

public synchronized void SendMessage(final String a)
{
    int size = 0;
    size = a.length();

    //Send size
    out.print(size);

    //Chunk it
    int chunks = CalculateChunks(a);
    String[] data = SplitToChunks(a, chunks);

    for (String message : data)
    {
        //Send data
        out.print(message);
    }
}

So eg Java calculates the packet will be 121 bytes and sends it to the server.

 size = a.length(); //Send size out.print(size); 

That code doesn't match the description; .length() on a Java string doesn't count bytes. You're sending the number of Java char elements in the string. A Java char is two bytes.

  out.print(message); 

message is a Java String . You need to look at how that String gets converted into bytes to be sent across the network connection. There's no guarantee that this conversion creates the same number of bytes as there were Java char s in the string. In particular, if the string is converted to UTF-8 then some individual Java char values will be converted to two or three bytes.

You need to do the conversion before sending the data so that you can count the actual number of bytes being sent.


On the C++ side, a std::string is a sequence of C++ char elements, which aren't the same as Java char s. C++ char is a single byte. In your code the std::string will contain the same data you read off the network; If the client sends UTF-8 data, then the std::string holds UTF-8 data. To display the string you'll need to use an API that handles whatever that encoding is, or convert it. Otherwise it will look some some of the characters are 'strange'.


Here's a reasonable start on learning some of the things you need to know:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Transmitting as UTF-8 bytes is fine.

The length in bytes can be gotten as

byte[] bytes = a.getBytes(StandardCharsets.UTF_8);
int size = bytes.length;

Now comes the problem with chunk sizes, which normally is understood as counted in bytes.

To not deal with half chars or for Asian half char pairs, it might be better to not use a PrintStream , but send byte[] chunks, over a (binary) OutputStream.

On the C++ side ensure that sizeof(char) == sizeof(byte) == 1 and you can hold in a std::string a UTF-8 sequence of bytes. You'll need extra code to create a wstring but could as well save this in a (UTF-8) file or database.

I found a solution to fix so I can get the correct representation of a string in the C++ application.

Thanx for your help! I tried everything u said, but was not able to solve my problem with it, but it gave me directions. However, one problem remains. I cannot get the same byte size on the server, so I gave up and remade my recv function to parse the incoming strings instead of packet size. So i kind of trashed the old way of thinking. There is prob a solution to this prob as well, but im tired of it hehe.

I changed the format to ISO-8859-1, and it worked for me. I found a forum thread someone asking how to convert a Java string to a Cstring, so I used his method and it worked amazingly. Also I used wrong output class in Java client. I used PrintWriter and also before that PrintStreamer. It seems they only work with text so I think it gave me wrong results on the c++ server. DataOutputStream was the way to go to send.

-JAVA CLIENT-

public NetworkSender(Socket s)
{
    mSocket = s;
    mEnc = Charset.forName("ISO-8859-1").newEncoder();
}

public boolean InitSender(){
    if(mSocket == null)
        return false;

    try {
        out = new DataOutputStream(mSocket.getOutputStream());

    } catch (IOException e) {
        e.printStackTrace();
        return false;
    }

    return true;
}

public synchronized boolean SendMessage(final String a) {

    String str_msg = a;
    str_msg = START_PACKET_INDICATION + a + END_PACKET_INDICATION;

    byte[] msg = StringEncodeCString(str_msg, false);

    try {
        out.write(msg);
    } catch (IOException e) {
        e.printStackTrace();
        return false;
    }

    return true;
}

private byte[] StringEncodeCString(String msg, boolean zeroTeminate)
{
    int zero = 0;

    if(zeroTeminate)
        zero = 1;

    int len = msg.length();
    byte b[] = new byte[len + zero];
    ByteBuffer bbuf = ByteBuffer.wrap(b);
    mEnc.encode(CharBuffer.wrap(msg), bbuf, true);

    if(zeroTeminate)
        b[len] = 0;

    return b;
}

-C++ SERVER-

bool NetworkChannel::Receive( std::string& msg)
{
    zReadMutex.lock();

    try
    {
        int errCode;
        char *buffer = new char [zMaxChunkSize];
        std::size_t start_pos;
        std::size_t end_pos;
        std::string startEnd;

        //Check buffer
        if (zSaveBufferString != "")
        {

            startEnd = GetStartEndIndicatorSubstr(zSaveBufferString, start_pos, end_pos);

            if (startEnd == "")
            {
                //Nothing inside buffer, continue
            }

            else if (!EraseStartEnd(startEnd))
            {
                zReadMutex.unlock();
                throw NetworkException("Failed to concat message!");
            }
            else
            {
                zSaveBufferString.erase(start_pos, end_pos + start_pos);
                msg = startEnd;
                zReadMutex.unlock();
                return true;
            }

        }

        errCode = recv(zSocket, buffer, zMaxChunkSize, 0);

        if (errCode == SOCKET_ERROR || errCode == 0)
        {
            zReadMutex.unlock();
            throw NetworkException("Failed Receiving Packet Size!", WSAGetLastError());
        }

        std::string temp(buffer);
        temp.resize(errCode);

        zSaveBufferString += temp;

        //Find a Start and end subStr to translate messages
        startEnd = GetStartEndIndicatorSubstr(zSaveBufferString, start_pos, end_pos);

        if (startEnd == "")
        {
            delete[]buffer;

            zReadMutex.unlock();
            return false;
        }

        if( !EraseStartEnd(startEnd) )
        {
            delete[]buffer;

            zReadMutex.unlock();
            throw NetworkException("Failed to erase startEnd!");
        }

        zSaveBufferString.erase(start_pos, end_pos + start_pos);

        msg = startEnd;

        delete [] buffer;

    }
    catch(...)
    {
        zReadMutex.unlock();
        throw;
    }

    zReadMutex.unlock();
    return true;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM