简体   繁体   English

在C中包含有符号和无符号变量的解释?

[英]Wrap around explanation for signed and unsigned variables in C?

I read a bit in C spec that unsigned variables(in particular unsigned short int ) perform some so called wrap around on integer overflow, although I couldn't find anything on signed variables except that I left with undefined behavior . 我在C规范中读到一点,无符号变量(特别是unsigned short int )在整数溢出上执行一些所谓的回绕 ,虽然除了我留下未定义的行为之外我找不到任何有符号的变量。

My professor told me that their values also get wrapped around (maybe he just meant gcc). 我的教授告诉我他们的价值观也被包围了(也许他只是意味着gcc)。 I thought the bits just get truncated and the bits I left with give me some weird value! 我以为这些位被截断了,我留下的位给了我一些奇怪的价值!

What wrap around is and how is it different from just truncating bits. 什么是环绕的,它与截断位有什么不同。

Signed integer variables do not have wrap-around behavior in C language. 有符号整数变量在C语言中没有环绕行为。 Signed integer overflow during arithmetic computations produces undefined behavior . 算术计算期间的有符号整数溢出会产生未定义的行为 Note BTW that GCC compiler you mentioned is known for implementing strict overflow semantics in optimizations, meaning that it takes advantage of the freedom provided by such undefined behavior situations: GCC compiler assumes that signed integer values never wrap around. 请注意,您提到的GCC编译器在优化中实现严格溢出语义是已知的,这意味着它利用了这种未定义行为情况提供的自由:GCC编译器假定有符号整数值永远不会回绕。 That means that GCC actually happens to be one of the compilers in which you cannot rely on wrap-around behavior of signed integer types. 这意味着GCC实际上恰好是您不能依赖有符号整数类型的环绕行为的编译器之一。

For example, GCC compiler can assume that for variable int i the following condition 例如,GCC编译器可以假设对于变量int i具有以下条件

if (i > 0 && i + 1 > 0)

is equivalent to a mere 相当于一个人

if (i > 0)

This is exactly what strict overflow semantics means. 这正是严格溢出语义的含义。

Unsigned integer types implement modulo arithmetic. 无符号整数类型实现模运算。 The modulo is equal 2^N where N is the number of bits in the value representation of the type. 模数等于2^N ,其中N是类型的值表示中的位数。 For this reason unsigned integer types do indeed appear to wrap around on overflow. 因此,无符号整数类型确实似乎在溢出时回绕。

However, C language never performs arithmetic computations in domains smaller than that of int / unsigned int . 但是,C语言从不在小于int / unsigned int域中执行算术计算。 Type unsigned short int that you mention in your question will typically be promoted to type int in expressions before any computations begin (assuming that the range of unsigned short fits into the range of int ). 您在问题中提到的类型unsigned short int通常会在任何计算开始之前被提升为表达式中的int类型(假设unsigned short的范围适合int的范围)。 Which means that 1) the computations with unsigned short int will be preformed in the domain of int , with overflow happening when int overflows, 2) overflow during such computations will lead to undefined behavior, not to wrap-around behavior. 这意味着1)使用unsigned short int的计算将在int的域中执行,当int溢出时发生溢出,2)在这样的计算期间溢出将导致未定义的行为,而不是环绕行为。

For example, this code produces a wrap around 例如,此代码生成环绕

unsigned i = USHRT_MAX;
i *= INT_MAX; /* <- unsigned arithmetic, overflows, wraps around */

while this code 而这段代码

unsigned short i = USHRT_MAX;
i *= INT_MAX; /* <- signed arithmetic, overflows, produces undefined behavior */

leads to undefined behavior. 导致未定义的行为。

If no int overflow happens and the result is converted back to an unsigned short int type, it is again reduced by modulo 2^N , which will appear as if the value has wrapped around. 如果没有发生int溢出并且结果被转换回unsigned short int类型,则它再次以模2^N减少,这将看起来好像值已经被包围。

Imagine you have a data type that's only 3 bits wide. 想象一下,你的数据类型只有3位宽。 This allows you to represent 8 distinct values, from 0 through 7. If you add 1 to 7, you will "wrap around" back to 0, because you don't have enough bits to represent the value 8 (1000). 这允许您表示8个不同的值,从0到7.如果添加1到7,您将“回绕”回到0,因为您没有足够的位来表示值8(1000)。

This behavior is well-defined for unsigned types. 对于无符号类型,此行为已明确定义。 It is not well-defined for signed types, because there are multiple methods for representing signed values, and the result of an overflow will be interpreted differently based on that method. 没有为签名类型定义良好,因为有多种方法可以表示有符号值,并且溢出的结果将根据该方法进行不同的解释。

Sign-magnitude: the uppermost bit represents the sign; 符号幅度:最高位表示符号; 0 for positive, 1 for negative. 0表示正数,1表示负数。 If my type is three bits wide again, then I can represent signed values as follows: 如果我的类型再次是三位宽,那么我可以表示如下的有符号值:

000  =  0
001  =  1
010  =  2
011  =  3
100  = -0
101  = -1
110  = -2
111  = -3

Since one bit is taken up for the sign, I only have two bits to encode a value from 0 to 3. If I add 1 to 3, I'll overflow with -0 as the result. 由于符号占用一位,因此我只有两位来编码0到3的值。如果我加1到3,我将溢出-0作为结果。 Yes, there are two representations for 0, one positive and one negative. 是的,有两个表示0,一个正面和一个负面。 You won't encounter sign-magnitude representation all that often. 您不会经常遇到符号幅度表示。

One's-complement: the negative value is the bitwise-inverse of the positive value. 一个补码:负值是正值的按位反转。 Again, using the three-bit type: 再次,使用三位类型:

000  =  0
001  =  1
010  =  2
011  =  3
100  = -3
101  = -2
110  = -1 
111  = -0

I have three bits to encode my values, but the range is [-3, 3]. 我有三位来编码我的值,但范围是[-3,3]。 If I add 1 to 3, I'll overflow with -3 as the result. 如果我加1到3,我将溢出-3作为结果。 This is different from the sign-magnitude result above. 这与上面的符号幅度结果不同。 Again, there are two encodings for 0 using this method. 同样,使用此方法有两种编码为0。

Two's-complement: the negative value is the bitwise inverse of the positive value, plus 1. In the three-bit system: 二进制补码:负值是正值的逐位反转加1.在三位系统中:

000  =  0
001  =  1
010  =  2
011  =  3
100  = -4
101  = -3
110  = -2
111  = -1

If I add 1 to 3, I'll overflow with -4 as a result, which is different from the previous two methods. 如果我加1到3,那么我将溢出-4,这与前两种方法不同。 Note that we have a slightly larger range of values [-4, 3] and only one representation for 0. 请注意,我们有一个稍大的值范围[-4,3],只有一个表示为0。

Two's complement is probably the most common method of representing signed values, but it's not the only one, hence the C standard can't make any guarantees of what will happen when you overflow a signed integer type. 二进制补码可能是表示有符号值的最常用方法,但它不是唯一的,因此C标准无法保证溢出有符号整数类型时会发生什么。 So it leaves the behavior undefined so the compiler doesn't have to deal with interpreting multiple representations. 因此它保留了未定义的行为,因此编译器不必处理解释多个表示。

The undefined behavior comes from early portability issues when signed integer types could be represented either as sign & magnitude, one's complement or two's complement. 当有符号整数类型可以表示为符号和幅度,一个补码或二进制补码时, 未定义的行为来自早期的可移植性问题。

Nowadays, all architectures represent integers as two's complement that do wrap around. 如今,所有架构都将整数表示为两个补码。 But be careful : since your compiler is right to assume you won't be running undefined behavior, you might encounter weird bugs when optimisation is on. 但要小心:因为您的编译器认为您不会运行未定义的行为是正确的,所以在启用优化时可能会遇到奇怪的错误。

In a signed 8-bit integer, the intuitive definition of wrap around might look like going from +127 to -128 -- in two's complement binary: 0111111 (127) and 1000000 (-128). 在带符号的8位整数中,环绕的直观定义可能看起来像是从+127到-128 - 在二进制补码二进制中:0111111(127)和1000000(-128)。 As you can see, that is the natural progress of incrementing the binary data--without considering it to represent an integer, signed or unsigned. 如您所见,这是递增二进制数据的自然进展 - 不考虑它表示整数,有符号或无符号。 Counter intuitively, the actual overflow takes place when moving from -1 (11111111) to 0 (00000000) in the unsigned integer's sense of wrap-around. 直观地说,当在无符号整数的环绕感中从-1(11111111)移动到0(00000000)时发生实际溢出。

This doesn't answer the deeper question of what the correct behavior is when a signed integer overflows because there is no "correct" behavior according to the standard. 这并没有回答更深层次的问题,即当有符号整数溢出时正确的行为是什么,因为根据标准没有“正确”的行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM