[英]Why do C++ streams use char instead of unsigned char?
I've always wondered why the C++ Standard library has instantiated basic_[io]stream and all its variants using the char
type instead of the unsigned char
type. 我一直想知道为什么C ++标准库使用
char
类型而不是unsigned char
类型实例化basic_ [io]流及其所有变体。 char
means (depending on whether it is signed or not) you can have overflow and underflow for operations like get(), which will lead to implementation-defined value of the variables involved. char
表示(取决于它是否已签名),您可以对get()等操作进行上溢和下溢,这将导致所涉及变量的实现定义值。 Another example is when you want to output a byte, unformatted, to an ostream using its put
function. 另一个例子是当你想使用
put
函数输出一个未格式化的字节到ostream。
Any ideas? 有任何想法吗?
Note : I'm still not really convinced. 注意 :我仍然不相信。 So if you know the definitive answer, you can still post it indeed.
所以如果你知道明确的答案,你仍然可以发布它。
Possibly I've misunderstood the question, but conversion from unsigned char to char isn't unspecified, it's implementation-dependent (4.7-3 in the C++ standard). 可能我误解了这个问题,但是从unsigned char到char的转换并没有特别说明,它依赖于实现(C ++标准中的4.7-3)。
The type of a 1-byte character in C++ is "char", not "unsigned char". C ++中的1字节字符的类型是“char”,而不是“unsigned char”。 This gives implementations a bit more freedom to do the best thing on the platform (for example, the standards body may have believed that there exist CPUs where signed byte arithmetic is faster than unsigned byte arithmetic, although that's speculation on my part).
这为实现提供了更多的自由来在平台上做最好的事情(例如,标准组织可能认为存在CPU,其中有符号字节算法比无符号字节算法更快,尽管这是我的推测)。 Also for compatibility with C. The result of removing this kind of existential uncertainty from C++ is C# ;-)
也是为了与C兼容。从C ++中消除这种存在不确定性的结果是C#;-)
Given that the "char" type exists, I think it makes sense for the usual streams to use it even though its signedness isn't defined. 鉴于存在“char”类型,我认为通常的流使用它是有意义的,即使它的签名没有定义。 So maybe your question is answered by the answer to, "why didn't C++ just define char to be unsigned?"
所以也许你的问题可以通过答案来解答,“为什么C ++只是将char定义为无符号?”
I have always understood it this way: the purpose of the iostream
class is to read and/or write a stream of characters, which, if you think about it, are abstract entities that are only represented by the computer using a character encoding. 我一直都是这样理解的:
iostream
类的目的是读取和/或写入一个字符流,如果你考虑它,它们是抽象的实体,只能由计算机使用字符编码来表示。 The C++ standard makes great pains to avoid pinning down the character encoding, saying only that "Objects declared as characters ( char
) shall be large enough to store any member of the implementation's basic character set," because it doesn't need to force the "implementation basic character set" to define the C++ language; C ++标准很难避免固定字符编码,只说“声明为字符的对象(
char
)应足够大以存储实现的基本字符集的任何成员”,因为它不需要强制“实现基本字符集”来定义C ++语言; the standard can leave the decision of which character encoding is used to the implementation (compiler together with an STL implementation), and just note that char
objects represent single characters in some encoding. 标准可以决定使用哪种字符编码来实现(编译器和STL实现),并且注意
char
对象在某些编码中表示单个字符。
An implementation writer could choose a single-octet encoding such as ISO-8859-1 or even a double-octet encoding such as UCS-2 . 实现编写器可以选择单八位字节编码,例如ISO-8859-1 ,甚至是双八位字节编码,例如UCS-2 。 It doesn't matter.
没关系。 As long as a
char
object is "large enough to store any member of the implementation's basic character set" (note that this explicitly forbids variable-length encodings ), then the implementation may even choose an encoding that represents basic Latin in a way that is incompatible with any common encoding! 只要
char
对象“足够大以存储实现的基本字符集的任何成员”(请注意,这明确禁止可变长度编码 ),那么实现甚至可以选择一种代表基本拉丁语的编码。与任何常见的编码不兼容!
It is confusing that the char
, signed char
, and unsigned char
types share "char" in their names, but it is important to keep in mind that char
does not belong to the same family of fundamental types as signed char
and unsigned char
. 令人困惑的是
char
, signed char
和unsigned char
类型在它们的名称中共享“char”,但重要的是要记住char
不属于与signed char
和unsigned char
相同的基本类型族。 signed char
is in the family of signed integer types: signed char
是有符号整数类型的系列:
There are four signed integer types : "signed char", "short int", "int", and "long int."
有四种有符号整数类型 :“signed char”,“short int”,“int”和“long int”。
and unsigned char
is in the family of unsigned integer types: 和
unsigned char
在无符号整数类型的族中:
For each of the signed integer types, there exists a corresponding (but different) unsigned integer type : "unsigned char", "unsigned short int", "unsigned int", and "unsigned long int," ...
对于每个有符号整数类型,存在相应的(但不同的) 无符号整数类型 :“unsigned char”,“unsigned short int”,“unsigned int”和“unsigned long int”,...
The one similarity between the char
, signed char
, and unsigned char
types is that "[they] occupy the same amount of storage and have the same alignment requirements". char
, signed char
和unsigned char
类型之间的一个相似之处是“[它们]占用相同数量的存储并具有相同的对齐要求”。 Thus, you can reinterpret_cast
from char *
to unsigned char *
in order to determine the numeric value of a character in the execution character set. 因此,您可以从
char *
reinterpret_cast
到unsigned char *
,以确定执行字符集中字符的数值。
To answer your question, the reason why the STL uses char
as the default type is because the standard streams are meant for reading and/or writing streams of characters, represented by char
objects, not integers ( signed char
and unsigned char
). 为了回答你的问题,STL使用
char
作为默认类型的原因是因为标准流用于读取和/或写入字符流,由char
对象表示,而不是整数( signed char
和unsigned char
)。 The use of char
versus the numeric value is a way of separating concerns. char
与数值的使用是分离问题的一种方式。
char is for characters, unsigned char for raw bytes of data, and signed chars for, well, signed data. char表示字符,unsigned char表示原始字节数据,signed表示字符,以及签名数据。
Standard does not specify if signed or unsigned char will be used for the implementation of char - it is compiler-specific. Standard没有指定signed或unsigned char是否将用于char的实现 - 它是特定于编译器的。 It only specifies that the "char" will be "enough" to hold characters on you system - the way characters were in those days, which is, no UNICODE.
它只指定“char”将“足够”来保存你系统中的字符 - 当时字符的方式,即没有UNICODE。
Using "char" for characters is the standard way to go. 对字符使用“char”是标准的方法。 Using unsigned char is a hack, although it'll match compiler's implementation of char on most platforms.
使用unsigned char是一种破解,虽然它在大多数平台上都匹配编译器的char实现。
I think this comment explains it well. 我认为这个评论很好地解释了。 To quote:
报价:
signed char and unsigned char are arithmetic, integral types just like int and unsigned int.
signed char和unsigned char是算术,整数类型,就像int和unsigned int一样。 On the other hand, char is expressly intended to be the "I/O" type that represents some opaque, system-specific fundamental unit of data on your platform.
另一方面,char明确地是“I / O”类型,它代表平台上一些不透明的,系统特定的基本数据单元。 I would use them in this spirit.
我会以这种精神使用它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.