简体   繁体   中英

How to decode unicode byte stream into characters

I am writing a server program where I am reading an UTF-8 encoded byte stream from a network socket and continuously interpreting these characters.

For characters which take more than one bytes to represent, sometime I just receive first byte of the character on the socket and the program interprets this byte to an invalid character.

For example, client runs below code:-

  String s = "Cañ";

  byte[] b = s.getBytes("UTF-8");

  //sending first three bytes
  send(b, 0, 3));   //send(byte[], offset, length)

  //sending last byte
  send(b, 3, 1);

When server receives first three bytes, it decodes them to Ca?.

How can i detect character boundaries on server?

The code given is made up to produce the issue. The character is broken by TCP sometimes, I believe.

The TCP protocol is reliable, you may lost some packet sometimes if the network jams. U can design a protocol yourself.By setting the first and last tag of your protocol data frame, you can check whether you have received the full data easily.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM