简体   繁体   English

无符号和签名扩展

[英]Unsigned and Signed Extension

Can someone explain the following code output to me: 有人可以向我解释以下代码输出:

void myprint(unsigned long a)
{
    printf("Input is %lx\n", a);
}
int main()
{
    myprint(1 << 31);
    myprint(0x80000000);
}

output with gcc main.c : 使用gcc main.c输出:

Input is ffffffff80000000
Input is 80000000

Why is (1 << 31) treated as signed and 0x80000000 is treated as unsigned? 为什么(1 << 31)被视为已签名且0x80000000被视为无符号?

In C the result of an expression depends on the types of the operands (or some of the operands). 在C中,表达式的结果取决于操作数(或某些操作数)的类型。 Particularly, 1 is an int (signed), therefore 1 << n is also int . 特别地, 1int (带符号),因此1 << n也是int

The type (including signed-ness) of 0x80000000 is determined by the rules here and it depends on the size of int and other integer types on your system, which you haven't specified. 0x80000000的类型(包括signed-ness)由此处的规则决定,它取决于系统上的int和其他整数类型的大小,您尚未指定。 A type is chosen such that 0x80000000 (a large positive number) is in range for that type. 选择一种类型,使得0x80000000 (一个大的正数)在该类型的范围内。

In case you have any misconception: the literal 0x80000000 is a large positive number. 如果您有任何误解:文字0x80000000是一个大的正数。 People sometimes mistakenly equate it to a negative number, mixing up values with representations. 人们有时会错误地将其等同于负数,将值与表示混合在一起。

In your question you say "Why is 0x80000000 is treated as unsigned?". 在你的问题中,你说“为什么0x80000000被视为无符号?”。 However your code does not actually rely on the signed-ness of 0x80000000 . 但是,您的代码实际上并不依赖于0x80000000的签名。 The only thing you do with it is pass it to the function which takes unsigned long parameter. 你唯一能做的就是将它传递给带有unsigned long参数的函数。 So whether or not it is signed or unsigned doesn't matter; 因此,无论是签名还是未签名都无关紧要; when passed to the conversion it is converted to an unsigned long with the same value. 当传递给转换时,它将转换为具有相同值的unsigned long (Since 0x80000000 is within the minimum guaranteed range for unsigned long there is no chance of it being out of range). (由于0x80000000unsigned long的最小保证范围内,因此不可能超出范围)。

So, that's 0x80000000 dealt with. 所以,那是0x80000000处理的。 What about 1 << 31 ? 1 << 31怎么样? If your system has 32-bit int (or narrower) this causes undefined behaviour due to signed arithmetic overflow. 如果您的系统具有32位int(或更窄),则由于带符号的算术溢出而导致未定义的行为 ( Link to further reading ). 链接到进一步阅读 )。 If your system has larger ints then this will produce the same output as the 0x80000000 line. 如果您的系统具有较大的整数,那么这将产生与0x80000000行相同的输出。

If you use 1u << 31 instead, and you have 32-bit ints, then there is no undefined behaviour and you are guaranteed to see the program output 80000000 twice. 如果使用1u << 31代替,并且你有32位整数,那么没有未定义的行为,你可以保证看到程序输出80000000两次。

Since your output was not 80000000 then we can conclude that your system has 32-bit (or narrower) int, and your program actually causes undefined behaviour. 由于您的输出不是80000000因此我们可以得出结论,您的系统具有32位(或更窄)的int,并且您的程序实际上会导致未定义的行为。 The type of 0x80000000 would be unsigned int if int is 32-bit, or unsigned long otherwise. 的类型的0x80000000unsigned int如果int是32位的,或unsigned long否则。

Why is (1 << 31) treated as signed and 0x80000000 is treated as unsigned? 为什么(1 << 31)被视为已签名且0x80000000被视为无符号?

From 6.5.7 Bitise shift operators in C11 specs: C.7规格中的6.5.7 Bitise移位运算符

3 The integer promotions are performed on each of the operands. 3对每个操作数执行整数提升。 The type of the result is that of the promoted left operand . 结果的类型是提升的左操作数的类型 [...] [...]
4 The result of E1 << E2 is E1 left-shifted E2 bit positions; 4 E1 << E2的结果是E1左移E2位位置; vacated bits are filled with zeros. 腾出的位用零填充。 If E1 has an unsigned type, the value of the result is E1 × 2 E2 , reduced modulo one more than the maximum value representable in the result type. 如果E1具有无符号类型,则结果的值为E1×2 E2 ,比结果类型中可表示的最大值减少一个模数。 If E1 has a signed type and nonnegative value, and E1 × 2 E2 is representable in the result type, then that is the resulting value; 如果E1具有带符号类型和非负值,并且E1×2 E2可在结果类型中表示,那么这就是结果值; otherwise, the behavior is undefined 否则,行为未定义

So, because 1 is an int (From section 6.4.4.1 mentioned in following paragraph), 1 << 31 is also an int for which the value is not well defined on systems where int is less than or equal to 32 bits. 因此,因为1是一个int (从下一段中提到的6.4.4.1节), 1 << 31也是一个int ,在int小于或等于32位的系统上没有很好地定义该值。 (May even trap) (甚至可能陷阱)


From 6.4.4.1 Integer constants 6.4.4.1整数常量

3 A decimal constant begins with a nonzero digit and consists of a sequence of decimal digits. 3十进制常数以非零数字开头,由一系列十进制数字组成。 An octal constant consists of the prefix 0 optionally followed by a sequence of the digits 0 through 7 only. 八进制常量由前缀0组成,可选地后跟一个数字0到7的序列。 A hexadecimal constant consists of the prefix 0x or 0X followed by a sequence of the decimal digits and the letters a (or A) through f (or F) with values 10 through 15 respectively. 十六进制常量由前缀0x或0X后跟一个十进制数字序列和字母a(或A)到f(或F)分别由值10到15组成。

and

5 The type of an integer constant is the first of the corresponding list in which its value can be represented . 5整数常量的类型是相应列表中可以表示其值第一个

 Suffix | 后缀| decimal Constant | 十进制常量| Hex Constant 十六进制常量\n---------+------------------------------------+--------------------------- --------- + ------------------------------------ + --- ------------------------\nnone | 没有| int | int | int INT\n         | | int | int | unsigned int unsigned int\n         | | | | long int 长整数\n         | | long int | long int | unsigned long int unsigned long int\n         | | | | long long int 长期的\n         | | long long int | long long int | unsigned long long int unsigned long long int\n---------+------------------------------------+--------------------------- --------- + ------------------------------------ + --- ------------------------\nu or U | 你或你 unsigned int | unsigned int | unsigned int unsigned int\n[...] | [...] | [...] | [...] | [...] [...]\n

So, 0x80000000 on a system with 32 bit or lesser bits int and 32 bit or larger unsigned int is an unsigned int , 因此,在32位或更小位int32位或更大位unsigned int的系统上, 0x80000000unsigned int

You apparently use a system with 32 bit int and unsigned int . 你显然使用32位intunsigned int

1 fits into an int , thus it is a signed int , 0x80000000 does not. 1适合int ,因此它是一个有signed int0x80000000不是。 While for decimal constants, the next larger signed type would be used which can hold that value, for hexadecimal and octal constants, first the corresponding unsigned type is used, if that fits. 对于十进制常量,将使用下一个较大的有符号类型,它可以保存该值,对于十六进制和八进制常量,首先使用相应的无符号类型(如果适合)。 This because they are commonly used unsigned anyway. 这是因为无论如何它们通常都是无符号的。 See the C standard, 6.4.4.1p5 for a complete value/type matrix. 有关完整的值/类型矩阵,请参阅C标准6.4.4.1p5

For signed integers, left shift with changing the sign is undefined behaviour . 对于有符号整数,左移和更改符号是未定义的行为 This implies all bets are off because you are beyond the language specification. 这意味着所有赌注都已关闭,因为您超出了语言规范。

Said that, the following is an interpretation of the results: 说,以下是对结果的解释:

  • long is apparently 64 bits on your system. long在你的系统上显然是64位。
  • The int shifted the 1 into the sign-bit as you might have expected. int1移动到符号位,如您所料。
  • This results in a negative int . 这导致负int
  • Negative ints are converted to unsigned such that a 2's complement representation does not need any operations (just reinterpretation of the bit-pattern) 负的ints被转换为unsigned ,因此2的补码表示不需要任何操作(只需重新解释位模式)
  • As you use 64 bit unsigned long , the sign is extended to the upper bits for for the argument to myprint . 当您使用64位unsigned long myprint ,符号将扩展到myprint参数的myprint

How to avoid it: 如何避免它:

  • Always use unsigned integers when shifting (eg append U suffix to integer constants where appropriate, here: 1U , or 0x1U ). 在移位时始终使用无符号整数(例如,在适当的位置将U后缀附加到整数常量 ,此处: 1U0x1U )。
  • Be aware about the standard integer conversions when using smaller types than int . 使用比int更小的类型时,请注意标准整数转换。
  • In general, if you need a specific size, you definitively should use stdint.h fixed width types. 通常,如果您需要特定的大小,您最终应该使用stdint.h固定宽度类型。 Note that the standard integer types have no defined bitwidth. 请注意, 标准整数类型没有定义的位宽。 For 32 bit, use uint32_t for variables. 对于32位, uint32_t用于变量。 For constants, use the macros: UINT32_C(1) (without suffix!). 对于常量,请使用宏: UINT32_C(1) (不带后缀!)。

My thought: The argument to the first call to 'myprint()' is an expression, so has to be calculated at runtime. 我的想法:第一次调用'myprint()'的参数是一个表达式,因此必须在运行时计算。 So the compiler is required to interpret it (via generated instructions) as a signed int left-shift, producing a negative signed int , which is then sign-extended to fill long , then reinterpreted as unsigned long . 因此编译器需要将它(通过生成的指令)解释为带符号的int左移,产生负的signed int ,然后对其进行符号扩展以填充long ,然后重新解释为unsigned long (I think that this might be a compiler error?) (我认为这可能是编译器错误?)

By contrast, the second call to 'myprint()' is a hard-coded integer constant expression, being passed to a routine taking unsigned long as argument; 相比之下,第二次调用'myprint()'是一个硬编码的整型常量表达式,传递给一个以unsigned long为参数的例程; I think the compiler is written to assume from this context that the constant expression is already an unsigned long due to there being no overt conflicting type information. 我认为编译器是为了假设从这个上下文中假设常量表达式已经无符号长整数,因为没有明显的冲突类型信息。

Correct me if I am wrong. 如果我错了,请纠正我。 This is what I have understood. 这就是我所理解的。

On my machine, as MM said, sizeof(int) = 4. (Confirmed by printing sizeof(int)) 在我的机器上,正如MM所说,sizeof(int)= 4.(通过print sizeof(int)确认)

So, 1 << 31 becomes (signed)0x80000000 as 1 is signed. 因此,1 << 31变为(带符号)0x80000000,因为1是有符号的。 But, 0x8000000 becomes unsigned as it can't fit in signed int (Because it is treated as positive and max positive by int can be 0x7fffffff). 但是,0x8000000变得无符号,因为它不能适合signed int(因为它被视为正数,而int的最大正数可以是0x7fffffff)。

So when a signed int is converted to long, then sign extension takes place (extension takes place using sign bit). 因此,当signed int转换为long时,将发生符号扩展(使用符号位进行扩展)。 And when unsigned int is converted, it is extended using 0's. 转换unsigned int时,使用0进行扩展。

So that is why there are extra 1's in case of myprint(1 << 31) and this is not the case in either 所以这就是为什么在myprint(1 << 31)的情况下会有额外的1,并且情况并非如此

1) myprint(1u << 31) 1)myprint(1u << 31)

2) myprint(1 << 31) when int > 32 bits because in that case the sign bit is not 1. 2)myprint(1 << 31)当int> 32位时,因为在那种情况下符号位不是1。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM