简体   繁体   中英

Encoding problem between C# TCP server and Java TCP Client

i'm facing some encoding issue which i'm not able to find the correct solution.

I have a C# TCP server, running as a window service which received and respond XML, the problem comes down when passing special characters in the output such as spanish characters with accents (like á,é,í and others).

Server response is being encoded as UTF-8, and java client is reading using UTF-8. But when i print its output the character is totally different.

This problem only happens in Java client(C# TCP client works as expected).

Following is an snippet of the server code that shows the encoding issue: C# Server

   byte[] destBytes = System.Text.Encoding.UTF8.GetBytes("á");
    try
    {
       clientStream.Write(destBytes, 0, destBytes.Length);
       clientStream.Flush();
    }catch (Exception ex)
    {
       LogErrorMessage("Error en SendResponseToClient: Detalle::", ex);
    }

Java Client:

socket.connect(new InetSocketAddress(param.getServerIp(), param.getPort()), 20000);
InputStream sockInp = socket.getInputStream();
InputStreamReader streamReader = new InputStreamReader(sockInp, Charset.forName("UTF-8"));
sockReader =  new BufferedReader(streamReader);
String tmp = null;
while((tmp = sockReader.readLine()) != null){
  System.out.println(tmp);
}

For this simple test, the output show is:

ß

I did some testing printing out the byte[] on each language and while on C# á output as: 195, 161

In java byte[] read print as: -61,-95

Will this have to do with the Signed (java), UnSigned (C#) of byte type?.

Any feedback is greatly appreciated.

To me this seems like an endianess problem... you can check that by reversing the bytes in Java before printing the string...

which usually would be solved by including a BOM... see http://de.wikipedia.org/wiki/Byte_Order_Mark

Are you sure that's not a unicode character you are attemping to encode to bytes as UTF-8 data?

I found the below has a useful way of testing to see if the data in that string is ccorrect UTF-8 before you send it.

How to test an application for correct encoding (eg UTF-8)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM