简体   繁体   English

为什么在32位机器中 - ( - 2147483648)= - 2147483648?

[英]Why is -(-2147483648) = - 2147483648 in a 32-bit machine?

I think the question is self explanatory, I guess it probably has something to do with overflow but still I do not quite get it. 我认为问题是自我解释的,我想它可能与溢出有关,但我仍然不太明白。 What is happening, bitwise, under the hood? 引擎盖下发生了什么?

Why does -(-2147483648) = -2147483648 (at least while compiling in C)? 为什么-(-2147483648) = -2147483648 (至少在用C编译时)?

Negating an (unsuffixed) integer constant: 否定(未固定的)整数常量:

The expression -(-2147483648) is perfectly defined in C, however it may be not obvious why it is this way. 表达式-(-2147483648)在C中完美定义,但是为什么它是这样的可能并不明显。

When you write -2147483648 , it is formed as unary minus operator applied to integer constant. 当你写-2147483648 ,它形成为应用于整数常量的一元减运算符。 If 2147483648 can't be expressed as int , then it s is represented as long or long long * (whichever fits first), where the latter type is guaranteed by the C Standard to cover that value . 如果2147483648不能表示为int ,则它表示为longlong long * (以先到者为准),后者的类型由C Standard保证覆盖该值

To confirm that, you could examine it by: 要确认这一点,您可以通过以下方式检查:

printf("%zu\n", sizeof(-2147483648));

which yields 8 on my machine. 在我的机器上产生8

The next step is to apply second - operator, in which case the final value is 2147483648L (assuming that it was eventually represented as long ). 下一个步骤是应用第二-操作者,在这种情况下的最终值是2147483648L (假设它最终表示为long )。 If you try to assign it to int object, as follows: 如果您尝试将其分配给int对象,如下所示:

int n = -(-2147483648);

then the actual behavior is implementation-defined . 然后实际行为是实现定义的 Referring to the Standard: 参考标准:

C11 §6.3.1.3/3 Signed and unsigned integers C11§6.3.1.3/ 3有符号和无符号整数

Otherwise, the new type is signed and the value cannot be represented in it; 否则,新类型将被签名,并且值无法在其中表示; either the result is implementation-defined or an implementation-defined signal is raised. 结果是实现定义的,或者引发实现定义的信号。

The most common way is to simply cut-off the higher bits. 最常见的方法是简单地切断较高位。 For instance, GCC documents it as: 例如,GCC将其记录为:

For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; 为了转换为宽度N的类型,该值以2 ^ N的模数减少到该类型的范围内; no signal is raised. 没有信号被提出。

Conceptually, the conversion to type of width 32 can be illustrated by bitwise AND operation: 从概念上讲,转换为宽度类型32可以通过按位AND运算来说明:

value & (2^32 - 1) // preserve 32 least significant bits

In accordance with two's complement arithmetic, the value of n is formed with all zeros and MSB (sign) bit set, which represents value of -2^31 , that is -2147483648 . 根据二进制补码算法, n的值由全零和MSB(符号)位组构成,其表示值-2^31 ,即-2147483648

Negating an int object: 否定int对象:

If you try to negate int object, that holds value of -2147483648 , then assuming two's complement machine, the program will exhibit undefined behavior : 如果你试图否定int对象,它保持值为-2147483648 ,然后假设两个补码机器,程序将显示未定义的行为

n = -n; // UB if n == INT_MIN and INT_MAX == 2147483647

C11 §6.5/5 Expressions C11§6.5/ 5表达式

If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined. 如果在计算表达式期间发生异常情况 (即,如果结果未在数学上定义或未在其类型的可表示值范围内),则行为未定义。

Additional references: 其他参考:


*) In withdrawed C90 Standard, there was no long long type and the rules were different. *)在退出的C90标准中,没有long long类型,规则也不同。 Specifically, sequence for unsuffixed decimal was int , long int , unsigned long int (C90 §6.1.3.2 Integer constants). 具体来说,未加十进制的十进制序列是intlong intunsigned long int (C90§6.1.3.2整数常量)。

†) This is due to LLONG_MAX , which must be at least +9223372036854775807 (C11 §5.2.4.2.1/1). †)这是由于LLONG_MAX ,必须至少为+9223372036854775807 (C11§5.2.4.2.1/ 1)。

Note: this answer does not apply as such on the obsolete ISO C90 standard that is still used by many compilers 注意:此答案不适用于许多编译器仍在使用的过时的ISO C90标准

First of all, on C99, C11, the expression -(-2147483648) == -2147483648 is in fact false : 首先,在C99,C11上,表达式-(-2147483648) == -2147483648实际上是假的

int is_it_true = (-(-2147483648) == -2147483648);
printf("%d\n", is_it_true);

prints 版画

0

So how it is possible that this evaluates to true? 那么这个评估结果如何可能呢? The machine is using 32-bit two's complement integers. 该机器使用32位二进制补码整数。 The 2147483648 is an integer constant that quite doesn't fit in 32 bits, thus it will be either long int or long long int depending on whichever is the first where it fits. 2147483648是一个整数常量,完全不适合32位,因此它将是long intlong long int具体取决于它所适合的第一个。 This negated will result in -2147483648 - and again, even though the number -2147483648 can fit in a 32-bit integer, the expression -2147483648 consists of a >32-bit positive integer preceded with unary - ! 否定将导致-2147483648 - 再次,即使数字-2147483648可以适合32位整数,表达式-2147483648由一个> 32位正整数组成,前面带有一元-

You can try the following program: 您可以尝试以下程序:

#include <stdio.h>

int main() {
    printf("%zu\n", sizeof(2147483647));
    printf("%zu\n", sizeof(2147483648));
    printf("%zu\n", sizeof(-2147483648));
}

The output on such machine most probably would be 4, 8 and 8. 这种机器上的输出最可能是4,8和8。

Now, -2147483648 negated will again result in +214783648 , which is still of type long int or long long int , and everything is fine. 现在, -2147483648否定将再次导致+214783648 ,它仍然是long intlong long int ,一切都很好。

In C99, C11, the integer constant expression -(-2147483648) is well-defined on all conforming implementations. 在C99,C11中,整数常量表达式-(-2147483648)在所有符合的实现上都是明确定义的。


Now, when this value is assigned to a variable of type int , with 32 bits and two's complement representation, the value is not representable in it - the values on 32-bit 2's complement would range from -2147483648 to 2147483647. 现在,当此值分配给int类型的变量时,具有32位和2的补码表示,该值在其中无法表示 - 32位2的补码上的值范围为-2147483648到2147483647。

The C11 standard 6.3.1.3p3 says the following of integer conversions: C11标准6.3.1.3p3表示以下整数转换:

  • [When] the new type is signed and the value cannot be represented in it; [何时]签署新类型并且无法在其中表示值; either the result is implementation-defined or an implementation-defined signal is raised. 结果是实现定义的,或者引发实现定义的信号。

That is, the C standard doesn't actually define what the value in this case would be, or doesn't preclude the possibility that the execution of the program stops due to a signal being raised, but leaves it to the implementations (ie compilers) to decide how to handle it (C11 3.4.1) : 也就是说,C标准实际上并没有定义这种情况下的值是什么,或者不排除由于信号被引发而导致程序执行停止的可能性,而是将其留给实现(即编译器) )决定如何处理它(C11 3.4.1)

implementation-defined behavior 实现定义的行为

unspecified behavior where each implementation documents how the choice is made 未指定的行为,其中每个实现都记录了如何进行选择

and (3.19.1) : (3.19.1)

implementation-defined value 实现定义的值

unspecified value where each implementation documents how the choice is made 未指定的值,其中每个实现记录了如何进行选择


In your case, the implementation-defined behaviour is that the value is the 32 lowest-order bits [*]. 在您的情况下,实现定义的行为是该值是32个最低位[*]。 Due to the 2's complement, the (long) long int value 0x80000000 has the bit 31 set and all other bits cleared. 由于2的补码,(长)长int值0x80000000的位设置为31,所有其他位清零。 In 32-bit two's complement integers the bit 31 is the sign bit - meaning that the number is negative; 在32位二进制补码整数中,位31是符号位 - 意味着该数字为负; all value bits zeroed means that the value is the minimum representable number, ie INT_MIN . 所有值位为零表示该值是最小可表示数,即INT_MIN


[*] GCC documents its implementation-defined behaviour in this case as follows : [*] GCC 在这种情况下记录了其实现定义的行为,如下所示

The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3). 当该值无法在该类型的对象中表示时,将整数转换为有符号整数类型的结果或信号(C90 6.2.1.2,C99和C11 6.3.1.3)。

For conversion to a type of width N , the value is reduced modulo 2^N to be within range of the type; 为了转换为宽度N的类型,该值以2^N模数减少到该类型的范围内; no signal is raised. 没有信号被提出。

This is not a C question, for on a C implementation featuring 32-bit two's complement representation for type int , the effect of applying the unary negation operator to an int having the value -2147483648 is undefined . 这不是C问题,因为对于具有int类型的32位二进制补码表示的C实现,将一元否定运算符应用于具有值-2147483648int的效果是未定义的 That is, the C language specifically disavows designating the result of evaluating such an operation. 也就是说,C语言特别拒绝指定评估这种操作的结果。

Consider more generally, however, how the unary - operator is defined in two's complement arithmetic: the inverse of a positive number x is formed by flipping all the bits of its binary representation and adding 1 . 考虑更一般地,然而,一元如何-操作者在二的补码算术定义:正数x的反通过翻转其二进制表示的所有位并添加形成1 This same definition serves as well for any negative number that has at least one bit other than its sign bit set. 该相同定义也适用于除了其符号位集之外至少有一位的任何负数。

Minor problems arise, however, for the two numbers that have no value bits set: 0, which has no bits set at all, and the number that has only its sign bit set (-2147483648 in 32-bit representation). 但是,对于没有设置值位的两个数字,出现了一些小问题:0,根本没有设置位,以及只设置其符号位的数字(以32位表示形式的-2147483648)。 When you flip all the bits of either of these, you end up with all value bits set. 当您翻转其中任何一个的所有位时,最终会设置所有值位。 Therefore, when you subsequently add 1, the result overflows the value bits. 因此,当您随后添加1时,结果会溢出值位。 If you imagine performing the addition as if the number were unsigned, treating the sign bit as a value bit, then you get 如果您想象执行添加就像数字是无符号的那样,将符号位视为值位,那么您就得到了

    -2147483648 (decimal representation)
-->  0x80000000 (convert to hex)
-->  0x7fffffff (flip bits)
-->  0x80000000 (add one)
--> -2147483648 (convert to decimal)

Similar applies to inverting zero, but in that case the overflow upon adding 1 overflows the erstwhile sign bit, too. 类似地适用于反转零,但在这种情况下,添加1时的溢出也会溢出以前的符号位。 If the overflow is ignored, the resulting 32 low-order bits are all zero, hence -0 == 0. 如果忽略溢出,则得到的32个低位全部为零,因此-0 == 0。

I'm gonna use a 4-bit number, just to make maths simple, but the idea is the same. 我将使用一个4位数字,只是为了简化数学,但这个想法是一样的。

In a 4-bit number, the possible values are between 0000 and 1111. That would be 0 to 15, but if you wanna represent negative numbers, the first bit is used to indicate the sign (0 for positive and 1 for negative). 在4位数字中,可能的值在0000和1111之间。这将是0到15,但如果您想表示负数,则第一位用于表示符号(0表示正数,1表示负数)。

So 1111 is not 15. As the first bit is 1, it's a negative number. 所以1111不是15.因为第一位是1,所以它是负数。 To know its value, we use the two-complement method as already described in previous answers: "invert the bits and add 1": 要知道它的值,我们使用前面回答中已经描述的二补码方法:“反转位并加1”:

  • inverting the bits: 0000 反转位:0000
  • adding 1: 0001 加1:0001

0001 in binary is 1 in decimal, so 1111 is -1. 二进制的0001是十进制的1,所以1111是-1。

The two-complement method goes both ways, so if you use it with any number, it will give you the binary representation of that number with the inverted sign. 双补码方法是双向的,因此如果您将它与任何数字一起使用,它将为您提供具有反转符号的该数字的二进制表示。

Now let's see 1000. The first bit is 1, so it's a negative number. 现在让我们看看1000.第一位是1,所以这是一个负数。 Using the two-complement method: 使用二补法:

  • invert the bits : 0111 反转位:0111
  • add 1: 1000 (8 in decimal) 加1:1000(十进制8)

So 1000 is -8. 所以1000是-8。 If we do -(-8) , in binary it means -(1000) , which actually means using the two-complement method in 1000. As we saw above, the result is also 1000. So, in a 4-bit number, -(-8) is equals -8. 如果我们这样做-(-8) ,二进制就意味着-(1000) ,这实际上意味着在1000中使用双补码方法。如上所述,结果也是1000.所以,在一个4位数字中, -(-8)等于-8。

In a 32-bit number, -2147483648 in binary is 1000..(31 zeroes) , but if you use the two-complement method, you'll end up with the same value (the result is the same number). 在32位数字中,二进制的-21474836481000..(31 zeroes) ,但如果使用二补码方法,则最终会得到相同的值(结果是相同的数字)。

That's why in 32-bit number -(-2147483648) is equals -2147483648 这就是为什么在32位数字-(-2147483648)等于-2147483648

It depends on the version of C, the specifics of the implementation and whether we are talking about variables or literals values. 它取决于C的版本,实现的细节以及我们是否在谈论变量或文字值。

The first thing to understand is that there are no negative integer literals in C "-2147483648" is a unary minus operation followed by a positive integer literal. 要理解的第一件事是C中没有负整数文字“-2147483648”是一元减法操作后跟一个正整数文字。

Lets assume that we are running on a typical 32-bit platform where int and long are both 32 bits and long long is 64 bits and consider the expression. 让我们假设我们运行在一个典型的32位平台上,其中int和long都是32位,long long是64位并考虑表达式。

(-(-2147483648) == -2147483648 ) ( - ( - 2147483648)== -2147483648)

The compiler needs to find a type that can hold 2147483648, on a comforming C99 compiler it will use type "long long" but a C90 compiler can use type "unsigned long". 编译器需要找到一个可以容纳2147483648的类型,在一个符合C99编译器的编译器中,它将使用“long long”类型,但C90编译器可以使用“unsigned long”类型。

If the compiler uses type long long then nothing overflows and the comparision is false. 如果编译器使用long long类型,则没有任何溢出,并且比较为false。 If the compiler uses unsigned long then the unsigned wraparound rules come into play and the comparision is true. 如果编译器使用unsigned long,则无符号环绕规则将起作用,并且比较结果为真。

For the same reason that winding a tape deck counter 500 steps forward from 000 (through 001 002 003 ...) will show 500, and winding it backward 500 steps backward from 000 (through 999 998 997 ...) will also show 500. 出于同样的原因,将磁带卡座计数器从000向前卷绕500步(通过001 002 003 ...)将显示500,并且从000向后向后绕500步(通过999 998 997 ...)也将显示500 。

This is two's complement notation. 这是两个补码表示法。 Of course, since 2's complement sign convention is to consider the topmost bit the sign bit, the result overflows the representable range, just like 2000000000+2000000000 overflows the representable range. 当然,由于2的补码符号约定是考虑符号位的最高位,结果溢出可表示的范围,就像2000000000 + 2000000000溢出可表示的范围。

As a result, the processor's "overflow" bit will be set (seeing this requires access to the machine's arithmetic flags, generally not the case in most programming languages outside of assembler). 结果,处理器的“溢出”位将被置位(看到这需要访问机器的算术标志,通常情况下汇编器之外的大多数编程语言都不是这种情况)。 This is the only value which will set the "overflow" bit when negating a 2's complement number: any other value's negation lies in the range representable by 2's complement. 这是唯一一个在取消2的补码数时设置“溢出”位的值:任何其他值的否定位于由2的补码表示的范围内。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM