简体   繁体   English

如何读取ASCII值在128-130范围内并将其转换为int值的char

[英]How to read a char that has ASCII value in range 128-130 and convert it to int value

I have an array of chars, some of them are ASCII 128 and 130 in decimal. 我有一个字符数组,其中一些是ASCII 128和十进制130。 I am trying to read them as normal chars, but instead of 128 I get 8218 as an int (casted to byte, got 26). 我试图将它们作为普通字符读取,但不是128,而是将8218作为int(转换为字节,得到26)。 I need to get that number between 128 and 130. I found some articles on Encodings, some people say I need to use Encoding 439. 我需要在128到130之间得到这个数字。我发现了一些关于编码的文章,有些人说我需要使用编码439。

Any ideas? 有任何想法吗?

A char (System.Char) in the CLR environment is an unsigned 16-bit number, a UTF-16 code unit . CLR环境中的char(System.Char)是无符号的16位数字,UTF-16 代码单元 From the Unicode Standard, Chapter 3, §3.9 : Unicode标准,第3章,§3.9

Code unit: The minimal bit combination that can represent a unit of encoded text for processing or interchange. 代码单元:最小位组合,可表示用于处理或交换的编码文本单元。

  • Code units are particular units of computer storage. 代码单元是计算机存储的特定单元。 Other character encoding standards typically use code units defined as 8-bit units—that is, octets. 其他字符编码标准通常使用定义为8位单元的代码单元,即八位字节。 The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. Unicode标准使用UTF-8编码形式的8位代码单元,UTF-16编码形式的16位代码单元和UTF-32编码形式的32位代码单元。

  • A code unit is also referred to as a code value in the information industry. 代码单元也称为信息产业中的代码值。

  • In the Unicode Standard, specific values of some code units cannot be used to represent an encoded character in isolation. 在Unicode标准中,某些代码单元的特定值不能用于单独表示编码字符。 This restriction applies to isolated surrogate code units in UTF-16 and to the bytes 80–FF in UTF-8. 此限制适用于UTF-16中的隔离代理代码单元和UTF-8中的字节80-FF。 Similar restrictions apply for the implementations of other character encoding standards; 类似的限制适用于其他字符编码标准的实现; for example, the bytes 81–9F, E0–FC in SJIS (Shift-JIS) cannot represent an encoded character by themselves. 例如,SJIS(Shift-JIS)中的字节81-9F,E0-FC不能自己表示编码字符。

Your "ASCII" text is no longer ASCII once it's in the CLR world. 一旦它在CLR世界中,您的“ASCII”文本就不再是ASCII。 ASCII is a 7-bit encoding and the code points 0x00–0x7F are maintained across all Unicode encodings (UTF-8, -16, -24, -32) for the sake of compatability. ASCII是一种7位编码,为了兼容性,所有Unicode编码(UTF-8,-16,-24,-32)都保持代码点0x00-0x7F。 In the non-Unicode world, 0x80–0xFF have always had multiple character mappings (and don't even look at EBCDIC vs ASCII). 在非Unicode世界中,0x80-0xFF总是有多个字符映射(甚至不看EBCDIC ASCII)。 Some ASCII implementations provided for parity as well: the high order bit would be set to maintain the desired parity. 一些ASCII实现也提供了奇偶校验:高位比特将被设置为保持所需的奇偶校验。

  • Even parity. 平价。 The high order bit is set to maintain an even number of 'on' bits in the octet. 高位比特被设置为在八位字节中保持偶数个“开”位。
  • Odd parity. 奇怪的平价。 The high order bit is set to maintain an odd number of 'on' bits in the octet. 高位比特被设置为在八位字节中保持奇数个“开”位。
  • No parity. 没有平价。 The high order bit is never set. 从未设置高位。

Presumably you're reading your "ASCII" text using a UTF-8 encoder/decoder (the CLR default). 据推测,您正在使用UTF-8编码器/解码器(CLR默认值)读取“ASCII”文本。 To get the numeric values you expect in your chars, you'll need to read the text using an encode/decoder suitable for the encoding your text is actually in (Windows 1252? something else?). 要获得您在字符中所需的数值,您需要使用适合您文本实际编码的编码/解码器来阅读文本(Windows 1252?还有其他什么?)。

A better approach for you, perhaps, would be to read your text octet by octet as binary, using System.IO.FileStream , rather than System.IO.TextReader and its minions. 或许,更好的方法是使用System.IO.FileStream而不是System.IO.TextReader及其minions,将octet的文本八位字节读取为二进制文件。 Then you've got the raw octets and you can convert them to text as you wish, or do math on the raw octet values. 然后你有原始的八位字节,你可以根据需要将它们转换为文本,或者对原始八位字节值进行数学运算。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM