简体   繁体   中英

Inconsistencies in sign extension when shifting signed int vs short

int main(){
  signed int a = 0b00000000001111111111111111111111; 
  signed int b = (a << 10) >> 10;
  // b is: 0b11111111111111111111111111111111

  signed short c = 0b0000000000111111; 
  signed short d = (c << 10) >> 10;
  // d is: 0b111111

  return 0;
}

Assuming int is 32 bits and short is 16 bits,

Why would b get sign extended but d does not get sign extended? I have tested this with gdb on x64, compiled with gcc.

In order to get short sign extended, I had to use two separate variables like this:

  signed short f = c << 10;
  signed short g = f >> 10;
  // g is: 0b1111111111111111

In the case of signed short , when an integer type smaller than int is used in an expression it is (in most cases) promoted to type int . This is spelled out in section 6.3.1.1p2 of the C standard :

The following may be used in an expression wherever an int or unsigned int may be used

  • An object or expression with an integer type (other than int or unsigned int ) whose integer conversion rank is less than or equal to the rank of int and unsigned int .
  • A bit-field of type _Bool , int , signed int ,or unsigned int .

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int ; otherwise, it is converted to an unsigned int . These are called the integer promotions All other types are unchanged by the integer promotions

And this promotion specifically happens in the case of bitwise shift operators as specified in section 6.5.7p3:

The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

So the short value 0x003f is promoted to the int value 0x0000003f and the left shift is applied. This results in 0x0000fc00, and the right shift gives a result of 0x0000003f.

The signed int case is a bit more interesting. In this case you're left-shifting a bit with the value 1 into the sign bit. This triggers undefined behavior as per 6.5.7p4:

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1×2 E2 , reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1×2 E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

So while the output you get for the signed int case is what you might expect it to be, it's actually undefined behavior and so you can't depend on that result.

short is automatically converted to int by the integer promotions , per C 2018 6.5.7 3:

The integer promotions are performed on each of the operands…

So (c << 10) shifts an int 0b111111 left 10 bits, yielding (in your C implementation) the 32-bit int 0b00000000000000001111110000000000. The sign bit in that is zero; it is a positive number.

When you do signed short f = c << 10; , the result of c << 10 is too big to fit in a signed short . It is 64,512, which is above the largest value your signed short can represent, 32,767. In an assignment, the value is converted to the type of the left operand. Per C 2018 6.3.1.3 3, the conversion is implementation-defined. GCC defines this conversion to wrap modulo 65,536 (two the power of the number of bits in the type). So converting 64,512 yields 64,512 − 65,536 = −1024. So f is set to −1024.

Then, in f >> 10 , you are shifting a negative value. As signed short , f is still promoted to int , but this conversion keeps the value, resulting in an int value of −1024. This is then shifted. This shift is implementation-defined, and GCC defines it to shift with sign extension . So the result of -1024 >> 10 is −1.

For starters according to the C Standard (6.5.7 Bitwise shift operators)

3 The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand.

Thus this value

signed short c = 0b0000000000111111;

in the expression used in this declaration

signed short d = (c << 10) >> 10;

is promoted to the integer type int . As the value is positive then the promoted values is also positive.

Thus this operation

c << 10

does not touch the sign bit.

On the other hand this code snippet

signed int a = 0b00000000001111111111111111111111; 
signed int b = (a << 10) >> 10;

has undefined behavior because according to same section of the C Standard

4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM