[英]TCP receiving extended ASCII or utf-8 characters
对于倒置问号¿
我收到两个字节[-62] [-65],但是我将如何获得可读的utf-8或ASCII字符编码?
That is the UTF8 code for that character. 这就是该字符的UTF8代码。 The inverted question mark is Unicode code point
191
which, in UTF8, is 0xc2:0xbf
. 反向问号是Unicode代码点
191
,在utf8中为0xc2:0xbf
。
You're seeing them as signed bytes. 您正在将它们视为带符号的字节。 For example
-62
signed is 256-62
or 194
unsigned - that's hex 0xc2
. 例如,
-62
符号是256-62
或194
无符号-这是十六进制0xc2
。
Similarly, -65
signed is 256-65
or 191
unsigned - that's hex 0xbf
. 类似地,
-65
符号是256-65
或191
无符号-即十六进制0xbf
。
If you want to convert your UTF8 sequence into a code point, you can use the table below. 如果要将UTF8序列转换为代码点,可以使用下表。
Range Encoding Binary value ----------------- -------- -------------------------- U+000000-U+00007f 0xxxxxxx 0xxxxxxx U+000080-U+0007ff 110yyyxx 00000yyy xxxxxxxx 10xxxxxx U+000800-U+00ffff 1110yyyy yyyyyyyy xxxxxxxx 10yyyyxx 10xxxxxx U+010000-U+10ffff 11110zzz 000zzzzz yyyyyyyy xxxxxxxx 10zzyyyy 10yyyyxx 10xxxxxx
For example, your 0xc2:0xbf
is binary 11000010 10111111
which matches the second case: 例如,您的
0xc2:0xbf
是二进制11000010 10111111
,它与第二种情况匹配:
11000010 10111111 ||||| |||||| |||\\ ////// ||| |||||||| 00000000 10111111 -> 0x00bf -> 191
Those 2 bytes probably are UTF-8 那两个字节可能是 UTF-8
For ASCII you would need a specific codepage. 对于ASCII,您将需要特定的代码页。
And what exactly is a 'readable' char encoding? “可读”字符编码到底是什么?
Look at the byte values in hexadecimal: 查看十六进制的字节值:
If you look up the Unicode information for the glyph in question, you can see that this is, inded, the two bytes that make up the UTF-8 encoding of the inverted question mark glyph. 如果查找有关字形的Unicode信息 ,则可以看到,这实际上是构成反向问号字形的UTF-8编码的两个字节。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.