[英]How to read a char that has ASCII value in range 128-130 and convert it to int value
I have an array of chars, some of them are ASCII 128 and 130 in decimal. 我有一个字符数组,其中一些是ASCII 128和十进制130。 I am trying to read them as normal chars, but instead of 128 I get 8218 as an int (casted to byte, got 26).
我试图将它们作为普通字符读取,但不是128,而是将8218作为int(转换为字节,得到26)。 I need to get that number between 128 and 130. I found some articles on Encodings, some people say I need to use Encoding 439.
我需要在128到130之间得到这个数字。我发现了一些关于编码的文章,有些人说我需要使用编码439。
Any ideas? 有任何想法吗?
A char (System.Char) in the CLR environment is an unsigned 16-bit number, a UTF-16 code unit . CLR环境中的char(System.Char)是无符号的16位数字,UTF-16 代码单元 。 From the Unicode Standard, Chapter 3, §3.9 :
从Unicode标准,第3章,§3.9 :
Code unit: The minimal bit combination that can represent a unit of encoded text for processing or interchange.
代码单元:最小位组合,可表示用于处理或交换的编码文本单元。
Code units are particular units of computer storage.
代码单元是计算机存储的特定单元。 Other character encoding standards typically use code units defined as 8-bit units—that is, octets.
其他字符编码标准通常使用定义为8位单元的代码单元,即八位字节。 The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form.
Unicode标准使用UTF-8编码形式的8位代码单元,UTF-16编码形式的16位代码单元和UTF-32编码形式的32位代码单元。
A code unit is also referred to as a code value in the information industry.
代码单元也称为信息产业中的代码值。
In the Unicode Standard, specific values of some code units cannot be used to represent an encoded character in isolation.
在Unicode标准中,某些代码单元的特定值不能用于单独表示编码字符。 This restriction applies to isolated surrogate code units in UTF-16 and to the bytes 80–FF in UTF-8.
此限制适用于UTF-16中的隔离代理代码单元和UTF-8中的字节80-FF。 Similar restrictions apply for the implementations of other character encoding standards;
类似的限制适用于其他字符编码标准的实现; for example, the bytes 81–9F, E0–FC in SJIS (Shift-JIS) cannot represent an encoded character by themselves.
例如,SJIS(Shift-JIS)中的字节81-9F,E0-FC不能自己表示编码字符。
Your "ASCII" text is no longer ASCII once it's in the CLR world. 一旦它在CLR世界中,您的“ASCII”文本就不再是ASCII。 ASCII is a 7-bit encoding and the code points 0x00–0x7F are maintained across all Unicode encodings (UTF-8, -16, -24, -32) for the sake of compatability.
ASCII是一种7位编码,为了兼容性,所有Unicode编码(UTF-8,-16,-24,-32)都保持代码点0x00-0x7F。 In the non-Unicode world, 0x80–0xFF have always had multiple character mappings (and don't even look at EBCDIC vs ASCII).
在非Unicode世界中,0x80-0xFF总是有多个字符映射(甚至不看EBCDIC 与 ASCII)。 Some ASCII implementations provided for parity as well: the high order bit would be set to maintain the desired parity.
一些ASCII实现也提供了奇偶校验:高位比特将被设置为保持所需的奇偶校验。
Presumably you're reading your "ASCII" text using a UTF-8 encoder/decoder (the CLR default). 据推测,您正在使用UTF-8编码器/解码器(CLR默认值)读取“ASCII”文本。 To get the numeric values you expect in your chars, you'll need to read the text using an encode/decoder suitable for the encoding your text is actually in (Windows 1252? something else?).
要获得您在字符中所需的数值,您需要使用适合您文本实际编码的编码/解码器来阅读文本(Windows 1252?还有其他什么?)。
A better approach for you, perhaps, would be to read your text octet by octet as binary, using System.IO.FileStream
, rather than System.IO.TextReader
and its minions. 或许,更好的方法是使用
System.IO.FileStream
而不是System.IO.TextReader
及其minions,将octet的文本八位字节读取为二进制文件。 Then you've got the raw octets and you can convert them to text as you wish, or do math on the raw octet values. 然后你有原始的八位字节,你可以根据需要将它们转换为文本,或者对原始八位字节值进行数学运算。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.