简体   繁体   中英

Getting a unicode string from a raw TCP stream in C#

So I am trying to make a modification to some software that is written in C# but I am not really a developer. The code reads data from a client and gets values from it. The problem I am seeing is that when you have values from the client that use non english characters it becomes jibberish. The code in question is:

public static string ReadNT(BinaryReader stream)
{
  ret = "";
  byte addByte = 0x00;
  do {
    addByte = ReadByte(stream);
    if (addByte != 0x00)
      ret += (char)addByte;
  } while (addByte != 0x00);
  return ret;
}

As far as I can tell it is going through the stream and converting things to a character one by one to get the string. The problem with that is it doesn't work with unicode/utf8. Is there a way to convert this into a string that works with utf8 values?

Try this:

public static string ReadNT(BinaryReader stream)
{
    List<byte> bytes = new List<byte>();
    byte addByte = 0x00;

    do
    {
        addByte = ReadByte(stream);

        if (addByte != 0x00)
        {
            bytes.Add((char)addByte);
        }
    } while (addByte != 0x00);

    return Encoding.UTF8.GetString(bytes.ToArray());
}

You can't convert the characters one at a time, as some could be expressed in more than one byte, hence my use of the List<byte> to gather up the whole stream.

I think the big caveat here is that you will need to be sure that the client is sending you UTF8 formatted text.

Edit:

Further to the comments to this answer, from Can UTF-8 contain zero byte?

Yes, the zero byte in UTF8 is code point 0, NUL. There is no other Unicode code point that will be encoded in UTF8 with a zero byte anywhere within it.

Therefore it is safe to assume that if you receive a zero byte, it is NUL and isn't actually part of a code point.

You could try and use the StreamReader class to read the UTF8 string.

public static string ReadNT(BinaryReader stream)
{
   return (new StreamReader(stream, Encoding.UTF8, false)).ReadString();
}

You should consider transferring the size of the string in addition to the string itself if that is something you have control over.

public static string ReadNT(BinaryReader stream, int length)
{
    return Encoding.UTF8.GetString(stream.ReadBytes(length));
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM