简体   繁体   English

为什么C或C ++标准没有将char明确定义为有符号或无符号?

[英]Why don't the C or C++ standards explicitly define char as signed or unsigned?

int main()
{
    char c = 0xff;
    bool b = 0xff == c;
    // Under most C/C++ compilers' default options, b is FALSE!!!
}

Neither the C or C++ standard specify char as signed or unsigned, it is implementation-defined. C或C ++标准都没有将char指定为有符号或无符号,它是实现定义的。

Why does the C/C++ standard not explicitly define char as signed or unsigned for avoiding dangerous misuses like the above code? 为什么C / C ++标准没有明确将char定义为有符号或无符号,以避免上面代码中的危险滥用?

Historical reasons, mostly. 主要是历史原因。

Expressions of type char are promoted to int in most contexts (because a lot of CPUs don't have 8-bit arithmetic operations). char类型的表达式在大多数情况下都会提升为int类型(因为许多CPU没有8位算术运算)。 On some systems, sign extension is the most efficient way to do this, which argues for making plain char signed. 在某些系统上,符号扩展是执行此操作的最有效方法,它主张将纯char签名。

On the other hand, the EBCDIC character set has basic characters with the high-order bit set (ie, characters with values of 128 or greater); 另一方面,EBCDIC字符集具有带有高位位集的基本字符(即,值等于或大于128的字符)。 on EBCDIC platforms, char pretty much has to be unsigned. 在EBCDIC平台上, char几乎必须是未签名的。

The ANSI C Rationale (for the 1989 standard) doesn't have a lot to say on the subject; ANSI C基本原理 (针对1989年标准)在这个问题上没什么可说的。 section 3.1.2.5 says: 第3.1.2.5节说:

Three types of char are specified: signed , plain, and unsigned . 指定了三种类型的char: signed ,plain和unsigned A plain char may be represented as either signed or unsigned, depending upon the implementation, as in prior practice. 像以前的实践一样,根据实现方式,纯char可以表示为有符号或无符号。 The type signed char was introduced to make available a one-byte signed integer type on those systems which implement plain char as unsigned. 引入了带signed char类型,以在那些将无格式字符实现为无符号的系统上提供一个一字节的带符号整数类型。 For reasons of symmetry, the keyword signed is allowed as part of the type name of other integral types. 出于对称性原因,允许使用signed关键字作为其他整数类型的类型名称的一部分。

Going back even further, an early version of the C Reference Manual from 1975 says: 再往前看,1975年的C参考手册的早期版本说:

A char object may be used anywhere an int may be. char对象可以在int可能存在的任何地方使用。 In all cases the char is converted to an int by propagating its sign through the upper 8 bits of the resultant integer. 在所有情况下,通过在所得整数的高8位中传播其符号,可将char转换为int This is consistent with the two's complement representation used for both characters and integers. 这与用于字符和整数的二进制补码表示形式一致。 (However, the sign-propagation feature disappears in other implementations.) (但是,符号传播功能在其他实现中消失了。)

This description is more implementation-specific than what we see in later documents, but it does acknowledge that char may be either signed or unsigned. 此描述比我们在以后的文档中看到的更特定于实现,但是它确实承认char可以是有符号的也可以是无符号的。 On the "other implementations" on which "the sign-propagation disappears", the promotion of a char object to int would have zero-extended the 8-bit representation, essentially treating it as an 8-bit unsigned quantity. 在“符号传播消失”的“其他实现”上,将char对象提升为int会将8位表示形式零扩展,实际上将其视为8位无符号数量。 (The language didn't yet have the signed or unsigned keyword.) (该语言尚无signed或未unsigned关键字。)

C's immediate predecessor was a language called B. B was a typeless language, so the question of char being signed or unsigned did not apply. C的直接前身是一种称为B的语言。B是一种无类型的语言,因此char是带符号的还是不带符号的问题不适用。 For more information about the early history of C, see the late Dennis Ritchie's home page , now moved here . 有关C的早期历史的更多信息,请参见已故的Dennis Ritchie的 主页 ,现在已移至此处

As for what's happening in your code (applying modern C rules): 至于您的代码中发生了什么(应用现代C规则):

char c = 0xff;
bool b = 0xff == c;

If plain char is unsigned, then the initialization of c sets it to (char)0xff , which compares equal to 0xff in the second line. 如果无格式char是无符号的,则c的初始化会将其设置为(char)0xff ,第二行中的值等于0xff But if plain char is signed, then 0xff (an expression of type int ) is converted to char -- but since 0xff exceeds CHAR_MAX (assuming CHAR_BIT==8 ), the result is implementation-defined . 但是,如果对普通char进行了签名,则0xff (类型为int的表达式)将转换为char ,但是由于0xff超过了CHAR_MAX(假设CHAR_BIT==8 ),因此结果是实现定义的 In most implementations, the result is -1 . 在大多数实现中,结果为-1 In the comparison 0xff == c , both operands are converted to int , making it equivalent to 0xff == -1 , or 255 == -1 , which is of course false. 在比较中0xff == c ,两个操作数都转换为int ,使其等效于0xff == -1255 == -1 ,这当然是错误的。

Another important thing to note is that unsigned char , signed char , and (plain) char are three distinct types. 还要注意的另一件事是, unsigned charsigned char和(plain) char是三种不同的类型。 char has the same representation as either unsigned char or signed char ; char具有相同的表示为任一 unsigned char signed char ; it's implementation-defined which one it is. 它是实现定义的。 (On the other hand, signed int and int are two names for the same type; unsigned int is a distinct type. (Except that, just to add to the frivolity, it's implementation-defined whether a bit field declared as plain int is signed or unsigned.)) (另一方面,有signed intint是同一类型的两个名称; unsigned int是不同的类型。(除了,为了增加浮夸性,是否定义为声明为纯int的位字段是有符号的,由实现定义)或未签名。))

Yes, it's all a bit of a mess, and I'm sure it would have be defined differently if C were being designed from scratch today. 是的,这有点混乱,而且我敢肯定,如果今天从头开始设计C,它的定义将有所不同。 But each revision of the C language has had to avoid breaking (too much) existing code, and to a lesser extent existing implementations. 但是C语言的每个修订版本都必须避免破坏(太多)现有代码,并在较小程度上避免现有实现。

char at first is meant to store characters, so whether it's signed or unsigned is not important. char首先是用来存储字符的,因此它是带符号的还是无符号的并不重要。 What really matters is how to perform maths on char efficiently. 真正重要的是如何对char有效地执行数学char So depend on the system, the compiler will choose what's most appropriate 因此,取决于系统,编译器将选择最合适的软件

Prior to ARMv4, ARM had no native support for loading halfwords and signed bytes. 在ARMv4之前,ARM不支持加载半字和带符号字节。 To load a signed byte you had to LDRB then sign extend the value (LSL it up then ASR it back down). 要加载一个有符号的字节,您必须对LDRB进行签名,然后对值进行符号扩展(将LSL向上,然后ASR向下)。 This is painful so char is unsigned by default. 这很痛苦,因此默认情况下char是未签名的。

why unsigned types are more efficent in arm cpu? 为什么无符号类型在ARM CPU中更有效?

In fact a lot of ARM compilers still use unsigned char by default, because even if you can load a byte with sign extension on modern ARM ISAs, that instruction is still less flexible than the zero extension version 实际上,许多ARM编译器默认仍使用unsigned char ,因为即使您可以在现代ARM ISA上加载带符号扩展名的字节,该指令的灵活性仍不如零扩展版本。

And most modern compilers also allow you to change char's signness instead of using the default setting 而且大多数现代编译器还允许您更改char的符号性,而不是使用默认设置

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM