简体   繁体   English

g ++ / gcc中char的签名及其历史

[英]The signedness of char in g++/gcc and its history

First let me start off by saying that I know char , signed char and unsigned char are different types in C++. 首先让我先说我知道charsigned charunsigned char是C ++中的不同类型。 From a quick reading of the standard, it also appears that whether char is signed is implementation-defined. 通过快速阅读标准,似乎char是否已signed是实现定义的。 And to make things just a little more fun, it appears g++ decides whether a char is signed on a per-platform basis! 为了让事情变得更有趣,看来g++决定是否在每个平台上signed char

So anyway with that background, let me introduce a bug I've run into using this toy program: 所以无论如何,在这个背景下,让我介绍一下我使用这个玩具程序遇到的一个错误:

#include <stdio.h>

int main(int argc, char* argv[])
{
    char array[512];
    int i;
    char* aptr = array + 256;

    for(i=0; i != 512; i++) {
        array[i] = 0;
    }

    aptr[0] = 0xFF;
    aptr[-1] = -1;
    aptr[0xFF] = 1;
    printf("%d\n", aptr[aptr[0]]);
    printf("%d\n", aptr[(unsigned char)aptr[0]]);

    return 0;
}

The intended-behavior is that both calls to printf should output 1. Of course, what happens on gcc and g++ 4.6.3 running on linux/x86_64 is that the first printf outputs -1 while the second outputs 1. This is consistent with chars being signed and g++ interpreting the negative array index of -1 (which is technically undefined behavior) sensibly. 预期的行为是对printf两个调用都应该输出1.当然,在linux/x86_64上运行的gccg++ 4.6.3上发生的是第一个printf输出-1而第二个输出1.这与字符一致被签名并且g++明智地解释-1的负数组索引(这是技术上未定义的行为)。

The bug seems easy enough to fix, I just need to cast the char to unsigned like shown above. 这个bug似乎很容易修复,我只需要将charunsigned如上所示。 What I want to know is whether this code was ever expected to work correctly on an x86 or x86_64 machines using gcc/g++ ? 我想知道的是,这个代码是否曾被期望在使用gcc/g++的x86或x86_64机器上正常工作? It appears this may work as intended on ARM platform where apparently chars are unsigned, but I would like know whether this code has always been buggy on x86 machines using g++ ? 看起来这可能在ARM平台上有效,显然字符是无符号的,但我想知道这个代码在使用g++ x86机器上是否总是有问题?

I see no undefined behavior in your program. 我看到你的程序中没有未定义的行为。 Negative array indices are not necessarily invalid, as long as the result of adding the index to the prefix refers to a valid memory location. 负数组索引不一定无效,只要将索引添加到前缀的结果是指有效的内存位置即可。 (A negative array index is invalid (ie, has undefined behavior) if the prefix is the name of an array object or a pointer to the 0th element of an array object, but that's not the case here.) (如果前缀是数组对象的名称或指向数组对象的第0个元素的指针,则负数组索引无效(即,具有未定义的行为),但这不是这种情况。)

In this case, aptr points to element 256 of a 512-element array, so the valid indices go from -256 to +255 (+256 yields a valid address just past the end of the array, but it can't be dereferenced). 在这种情况下, aptr指向512元素数组的元素256,因此有效索引从-256到+255(+256产生一个有效的地址,刚好超过数组的末尾,但它不能被解除引用) 。 Assuming CHAR_BIT==8 , any of signed char , unsigned char , or plain char has a range that's a subset of the array's valid index range. 假设CHAR_BIT==8 ,任何signed charunsigned char或plain char的范围都是数组有效索引范围的子集。

If plain char is signed, then this: 如果签署了普通char ,那么:

aptr[0] = 0xFF;

will implicitly convert the int value 0xFF ( 255 ) to char , and the result of that conversion is implementation-defined -- but it will be within the range of plain char , and it will almost certainly be -1 . 将隐式地将int0xFF255 )转换为char ,并且该转换的结果是实现定义的 - 但它将在plain char的范围内,并且几乎肯定是-1 If plain char is unsigned, then it will assign the value 255 to aptr[0] . 如果plain char是无符号的,那么它会将值255 aptr[0] So the behavior of the code depends on the signedness of plain char (and possibly on the implementation-defined result of a conversion of an out-of-range value to a signed type), but there is no undefined behavior. 因此,代码的行为取决于plain char的签名(可能还有实现定义的超出范围值转换为signed类型的结果),但是没有未定义的行为。

(Converting an out-of-range value to a signed type may also, starting with C99, raise an implementation-defined signal, but I know of no implementation that actually does that. Raising a signal on a conversion of 0xFF to char would probably break existing code, so compiler writers are highly motivated not to do that.) (从C99开始,将超出范围的值转换为带符号的类型也可以引发实现定义的信号,但我知道没有实际实现的实现。将0xFF的转换上的信号提升为char可能破坏现有代码,因此编译器编写者非常积极地不这样做。)

The type of an array has nothing to do with the indexes (except for underlying memory access). 数组的类型与索引无关(底层内存访问除外)。

For example: 例如:

signed int a[25];
unsigned int b[25];

int value = a[-1];
unsigned int u_value = b[-5];

The indexing formula for both cases is: 两种情况的索引公式为:

memory_address = starting_address_of_array
               + index * sizeof(array_type);

As far as char goes, it's size is 1 regardless (by definition of the language specifications). char而言,它的大小为1(根据语言规范的定义)。

The usage of char in arithmetic expressions may depend on whether it is signed or unsigned. 算术表达式中char的使用可能取决于它是有符号还是无符号。

The intended-behavior is that both calls to printf should output 1 预期的行为是对printf的两次调用都应该输出1

Are you sure? 你确定吗?

The rvalue of aptr[0] is a signed char and is -1, which is again used to index in to aptr[] and thus what you get is -1 for the first printf(). aptr [0]的rval是一个带符号的char,它是-1,它再次用于索引到aptr [],因此你得到的是第一个printf()的-1。

The same goes for the second printf but there, using a type cast you ensure that it is interpreted as an unsigned char, thus you end up with 255, and using it to index in to aptr[] you get 1 from the second printf(). 对于第二个printf也是如此,但是使用类型转换确保它被解释为unsigned char,因此最终得到255,并使用它来索引到aptr [],你从第二个printf得到1( )。

I believe your assumption about the expected behavior is incorrect. 我相信你对预期行为的假设是不正确的。

Edit 1: 编辑1:

It appears this may work as intended on ARM platform where apparently chars are unsigned, but I would like know whether this code has always been buggy on x86 machines using g++? 看起来这可能在ARM平台上有效,显然字符是无符号的,但我想知道这个代码在使用g ++的x86机器上是否总是有问题?

Based on this statement it seems that you know that char on x86 is signed (as against what some people assume what you assumed). 基于这个陈述,似乎你知道x86上的char是签名的(与某些人假设你所假设的一样)。 As such the explanation that I provided should be good ie considering char as signed char on x86. 因此我提供的解释应该是好的,即将char视为x86上的signed char。

Edit 2: 编辑2:

Using a negative array index is perfectly fine as long as the pointer operand is to an interior element: stackoverflow.com/questions/3473675/negative-array-indexes-in-c – ecatmur 只要指针操作数是内部元素,使用负数组索引就完全没问题了:stackoverflow.com/questions/3473675/negative-array-indexes-in-c - ecatmur

This is one of the comments to the question by @ecatmur. 这是@ecatmur对问题的评论之一。 Which clarifies that a negative index is fine as against what some people think. 这澄清了负面指数与某些人的想法相反。

Your printf statements are the same as: 您的printf语句与以下内容相同:

printf("%d\n", aptr[(char)255]);
printf("%d\n", aptr[(unsigned char)(char)255]);

And thus obviously depends on the platform's behavior for these conversions. 因此显然取决于平台对这些转换的行为。

What I want to know is whether this code was ever expected to work correctly on an x86 or x86_64 machines using gcc/g++? 我想知道的是,这个代码是否曾被期望在使用gcc / g ++的x86或x86_64机器上正常工作?

Taking 'correctly' to mean the behavior you describe, no, this should never have been expected to behave that way on a platform where char is signed. 以'正确'来表示您描述的行为,不,这应该永远不会在char签名的平台上以这种方式表现。

When char is signed (and cannot represent 255) you get a value that is implementation defined and within the representable range. char被签名(并且不能表示255)时,您将获得一个实现定义且在可表示范围内的值。 For an 8-bit, two's-complement representation that means you get some value in the range [-128, 127]. 对于8位,二进制补码表示,这意味着您在[-128,127]范围内得到一些值。 That means that the only possible outputs for: 这意味着唯一可能的输出:

printf("%d\n", aptr[(char)255]);

are "0" and "-1" (ignoring cases where printf fails). 是“0”和“-1”(忽略printf失败的情况)。 The common implementation defined conversion results in printing "-1". 常见的实现定义转换导致打印“-1”。


The code is well defined but not portable between implementations that define different char signedness. 代码定义良好但在定义不同char签名的实现之间不可移植。 Writing portable code includes not depending on char being signed or unsigned, which in turn means you should only use char values as array indices if the indices are limited to the range [0, 127]. 编写可移植代码包括不依赖于有符号或无符号的char ,这反过来意味着如果索引限制在[0,127]范围内,则只应使用char值作为数组索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM