简体   繁体   English

为什么C ++流使用char而不是unsigned char?

[英]Why do C++ streams use char instead of unsigned char?

I've always wondered why the C++ Standard library has instantiated basic_[io]stream and all its variants using the char type instead of the unsigned char type. 我一直想知道为什么C ++标准库使用char类型而不是unsigned char类型实例化basic_ [io]流及其所有变体。 char means (depending on whether it is signed or not) you can have overflow and underflow for operations like get(), which will lead to implementation-defined value of the variables involved. char表示(取决于它是否已签名),您可以对get()等操作进行上溢和下溢,这将导致所涉及变量的实现定义值。 Another example is when you want to output a byte, unformatted, to an ostream using its put function. 另一个例子是当你想使用put函数输出一个未格式化的字节到ostream。

Any ideas? 有任何想法吗?


Note : I'm still not really convinced. 注意 :我仍然不相信。 So if you know the definitive answer, you can still post it indeed. 所以如果你知道明确的答案,你仍然可以发布它。

Possibly I've misunderstood the question, but conversion from unsigned char to char isn't unspecified, it's implementation-dependent (4.7-3 in the C++ standard). 可能我误解了这个问题,但是从unsigned char到char的转换并没有特别说明,它依赖于实现(C ++标准中的4.7-3)。

The type of a 1-byte character in C++ is "char", not "unsigned char". C ++中的1字节字符的类型是“char”,而不是“unsigned char”。 This gives implementations a bit more freedom to do the best thing on the platform (for example, the standards body may have believed that there exist CPUs where signed byte arithmetic is faster than unsigned byte arithmetic, although that's speculation on my part). 这为实现提供了更多的自由来在平台上做最好的事情(例如,标准组织可能认为存在CPU,其中有符号字节算法比无符号字节算法更快,尽管这是我的推测)。 Also for compatibility with C. The result of removing this kind of existential uncertainty from C++ is C# ;-) 也是为了与C兼容。从C ++中消除这种存在不确定性的结果是C#;-)

Given that the "char" type exists, I think it makes sense for the usual streams to use it even though its signedness isn't defined. 鉴于存在“char”类型,我认为通常的流使用它是有意义的,即使它的签名没有定义。 So maybe your question is answered by the answer to, "why didn't C++ just define char to be unsigned?" 所以也许你的问题可以通过答案来解答,“为什么C ++只是将char定义为无符号?”

I have always understood it this way: the purpose of the iostream class is to read and/or write a stream of characters, which, if you think about it, are abstract entities that are only represented by the computer using a character encoding. 我一直都是这样理解的: iostream类的目的是读取和/或写入一个字符流,如果你考虑它,它们是抽象的实体,只能由计算机使用字符编码来表示。 The C++ standard makes great pains to avoid pinning down the character encoding, saying only that "Objects declared as characters ( char ) shall be large enough to store any member of the implementation's basic character set," because it doesn't need to force the "implementation basic character set" to define the C++ language; C ++标准很难避免固定字符编码,只说“声明为字符的对象( char )应足够大以存储实现的基本字符集的任何成员”,因为它不需要强制“实现基本字符集”来定义C ++语言; the standard can leave the decision of which character encoding is used to the implementation (compiler together with an STL implementation), and just note that char objects represent single characters in some encoding. 标准可以决定使用哪种字符编码来实现(编译器和STL实现),并且注意char对象在某些编码中表示单个字符。

An implementation writer could choose a single-octet encoding such as ISO-8859-1 or even a double-octet encoding such as UCS-2 . 实现编写器可以选择单八位字节编码,例如ISO-8859-1 ,甚至是双八位字节编码,例如UCS-2 It doesn't matter. 没关系。 As long as a char object is "large enough to store any member of the implementation's basic character set" (note that this explicitly forbids variable-length encodings ), then the implementation may even choose an encoding that represents basic Latin in a way that is incompatible with any common encoding! 只要char对象“足够大以存储实现的基本字符集的任何成员”(请注意,这明确禁止可变长度编码 ),那么实现甚至可以选择一种代表基本拉丁语的编码。与任何常见的编码不兼容!

It is confusing that the char , signed char , and unsigned char types share "char" in their names, but it is important to keep in mind that char does not belong to the same family of fundamental types as signed char and unsigned char . 令人困惑的是charsigned charunsigned char类型在它们的名称中共享“char”,但重要的是要记住char不属于与signed charunsigned char相同的基本类型族。 signed char is in the family of signed integer types: signed char是有符号整数类型的系列:

There are four signed integer types : "signed char", "short int", "int", and "long int." 有四种有符号整数类型 :“signed char”,“short int”,“int”和“long int”。

and unsigned char is in the family of unsigned integer types: unsigned char在无符号整数类型的族中:

For each of the signed integer types, there exists a corresponding (but different) unsigned integer type : "unsigned char", "unsigned short int", "unsigned int", and "unsigned long int," ... 对于每个有符号整数类型,存在相应的(但不同的) 无符号整数类型 :“unsigned char”,“unsigned short int”,“unsigned int”和“unsigned long int”,...

The one similarity between the char , signed char , and unsigned char types is that "[they] occupy the same amount of storage and have the same alignment requirements". charsigned charunsigned char类型之间的一个相似之处是“[它们]占用相同数量的存储并具有相同的对齐要求”。 Thus, you can reinterpret_cast from char * to unsigned char * in order to determine the numeric value of a character in the execution character set. 因此,您可以从char * reinterpret_castunsigned char * ,以确定执行字符集中字符的数值。

To answer your question, the reason why the STL uses char as the default type is because the standard streams are meant for reading and/or writing streams of characters, represented by char objects, not integers ( signed char and unsigned char ). 为了回答你的问题,STL使用char作为默认类型的原因是因为标准流用于读取和/或写入字符流,由char对象表示,而不是整数( signed charunsigned char )。 The use of char versus the numeric value is a way of separating concerns. char与数值的使用是分离问题的一种方式。

char is for characters, unsigned char for raw bytes of data, and signed chars for, well, signed data. char表示字符,unsigned char表示原始字节数据,signed表示字符,以及签名数据。

Standard does not specify if signed or unsigned char will be used for the implementation of char - it is compiler-specific. Standard没有指定signed或unsigned char是否将用于char的实现 - 它是特定于编译器的。 It only specifies that the "char" will be "enough" to hold characters on you system - the way characters were in those days, which is, no UNICODE. 它只指定“char”将“足够”来保存你系统中的字符 - 当时字符的方式,即没有UNICODE。

Using "char" for characters is the standard way to go. 对字符使用“char”是标准的方法。 Using unsigned char is a hack, although it'll match compiler's implementation of char on most platforms. 使用unsigned char是一种破解,虽然它在大多数平台上都匹配编译器的char实现。

I think this comment explains it well. 我认为这个评论很好地解释了。 To quote: 报价:

signed char and unsigned char are arithmetic, integral types just like int and unsigned int. signed char和unsigned char是算术,整数类型,就像int和unsigned int一样。 On the other hand, char is expressly intended to be the "I/O" type that represents some opaque, system-specific fundamental unit of data on your platform. 另一方面,char明确地是“I / O”类型,它代表平台上一些不透明的,系统特定的基本数据单元。 I would use them in this spirit. 我会以这种精神使用它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM