简体   繁体   English

解码字符串C#

[英]Decoding a string c#

I created TCP server that is distributing client's messages and run on a problem. 我创建了用于分发客户端消息的TCP服务器,并在出现问题时运行。 When I'm sending Cyrillic messages through stream they're not decoding properly. 当我通过流发送西里尔字母消息时,它们无法正确解码。 Anyone knows how can I repair that? 谁知道我该如何修复?

Here's the code for sending the message: 这是发送消息的代码:

var message = Console.ReadLine().ToCharArray().Select(x => (byte)x).ToArray();
stream.Write(message);`

Here's the code for receiving: 这是接收代码:

var numberOfBytes = stream.Read(buffer,0,1024);
Console.WriteLine($"{numberOfBytes} bytes received");
var chars = buffer.Select(x=>(char)x).ToArray();
var message = new string(chars);

The problem is that a character in C# represents a 2-byte UTF-16 character. 问题是C#中的字符代表2字节的UTF-16字符。 A cyrillic character is bigger than 255 in UTF-16, so you lose information when converting it to a byte. 西里尔字母大于UTF-16中的255,因此将其转换为字节时会丢失信息。

To convert a string to a byte array, use the Encoding class: 要将字符串转换为字节数组,请使用Encoding类:

byte[] buffer = System.Text.Encoding.UTF8.GetBytes(Console.ReadLine());

To convert it back to a string on the receiver's end, write: 要将其转换回接收者端的字符串,请输入:

string message = System.Text.Encoding.UTF8.GetString(buffer);

Another problem is that Stream.Read does not guarantee to read all bytes of your message at once (Your stream does not know that you send packets with a certain size). 另一个问题是Stream.Read不能保证一次读取消息的所有字节(您的流不知道您发送的是一定大小的数据包)。 So it could happen, for example, that the last byte of the received byte array is only the first byte of a 2-byte character, and you receive the other byte the next time you call Stream.Read. 因此,可能会发生这样的情况,例如,接收到的字节数组的最后一个字节只是2字节字符的第一个字节,而下次调用Stream.Read时又收到另一个字节。

There are several solutions to this issue: 有几种解决此问题的方法:

  1. Wrap the Stream in a StreamWriter at the sender's end and in a StreamReader at the receiver's end. 在发送方将Stream包装在StreamWriter中,在接收方将Stream包装在StreamReader中。 This is probably the simplest method if you transmit only text. 如果仅传输文本,这可能是最简单的方法。
  2. Transmit the length of your message at the beginning of your message as an integer. 在消息的开头以整数形式发送消息的长度。 This number tells the receiver how many bytes he has to read. 这个数字告诉接收者他必须读取多少字节。

To convert a string to bytes, use System.Text.Encoding.GetBytes(string) . 要将字符串转换为字节,请使用System.Text.Encoding.GetBytes(string) I suggest you change the sending code to: 我建议您将发送代码更改为:

// using System.Text;
var messageAsBytes = Encoding.UTF8.GetBytes(Console.ReadLine());

To convert bytes to a string, use System.Text.Encoding.GetString(byte[]) . 要将字节转换为字符串,请使用System.Text.Encoding.GetString(byte[]) If you receive UTF-8-encoded bytes: 如果您收到UTF-8编码的字节:

// using System.Text;
var messageAsString = Encoding.UTF8.GetString(buffer);

Some suggested reading: 一些建议阅读:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM