简体   繁体   中英

how does c compiler handle unsigned and signed integer? Why the assembly code for unsigned and signed arithmetic operation are the same?

I am reading the book: CS-APPe2. C has unsigned and signed int type and in most architectures uses two's-complement arithmetic to implement signed value; but after learning some assembly code, I found that very few instructions distinguish between unsigned and signed. So my question are:

  1. Is it the compiler's responsibility to differentiate signed and unsigned? If yes, how does it do that?

  2. Who implements the two's-complement arithmetic - the CPU or the complier?

Add some more info:

After learning some more instructions, actually there are some of them differentiate between signed and unsigned, such as setg,seta,etc. Further, CF and OF apply to unsigned and respectively. But most integer arithmetic instructions treat unsigned and signed the same,eg

int s = a + b

and

unsigned s = a + b

generate the same instruction.

So when executing ADD sd , should the CPU treat s&d unsigned or signed? Or it is irrelevant, because the bit pattern of both result are the same and it is the compiler's task to convert the underlying bit pattern result to unsigned or signed?

PS i am using x86 and gcc

In many cases there is no difference at the machine level between signed and unsigned operations, and it is merely a matter of interpretation of the bit pattern. For example consider the following 4-bit word operation:

Binary Add  Unsigned   2's comp
----------  --------   --------
  0011          3         3
+ 1011       + 11       - 5
-------     --------   --------
  1110         14        -2  
-------     --------   --------

The binary pattern is the same for the signed and unsigned operation. Note that subtraction is merely the addition of a negative value. When a SUB operation is performed, the right-hand operand is two's complemented (invert bits and increment) then added (the ALU circuit responsible is an adder ); not at the instruction level you understand, but at the logic level, although it would be possible to implement a machine without a SUB instruction and still perform subtraction albeit in two instructions rather than one.

There are some operations that do require different instructions depending on the type, and it is the compiler's responsibility to generate appropriate code generally speaking - architectural variations may apply.

It's quite easy. Operations like addition and subtraction don't need any adjustment for signed types in two's complement arithmetic. Just perform a mind experiment and imagine an algorithm using just the following mathematical operations:

  • increment by one
  • decrement by one
  • compare with zero

Addition is just taking items one by one from one heap and putting them to the other heap until the first one is empty. Subtraction is taking from both of them at once, until the subtracted one is empty. In modular arithmetics, you just just treat the smallest value as the largest value plus one and it works. Two's complement is just a modular arithmetic where the smallest value is negative.

If you want to see any difference, I recommend you to try operations that aren't safe with respect to overflow. One example is comparison ( a < b ).

Is it the complier's responsibility to differentiate signed and unsigned? If yes, how does it do that?

By generating different assembly whenever needed.

Who implements the two's-complement arithmetic - the CPU or the complier?

It's a difficult question. Two's complement is probably the most natural way to work with negative integers in a computer. Most operations for two's complement with overflow are the same as for unsigned integers with overflow. The sign can be extracted from a single bit. Comparison can be conceptually done by subtraction (which is signedness-agnostic), sign bit extraction and comparison to zero.

It's the CPU's arithmetic features that allow the compiler to produce computations in two's complement, though.

unsigned s = a + b

Note that the way plus is computed here don't depend on the result's type. Insead it depends on the types of variables to the right of the equal sign.

So when executing ADD sd, should the CPU treat s&d unsigned or signed?

CPU instructions don't know about the types, those are only used by the compiler. Also, there's no difference between adding two unsigned numbers and adding two signed numbers. It would be stupid to have two instructions for the same operation.

There's no need to differentiate signed and unsigned ints for most arithmetic/logic operations. It's often only need to take the sign into account when printing, zero/sign extending or comparing values. In fact the CPU doesn't know anything about a value's type. A 4-byte value is just a series of bits, it doesn't have any meaning unless the user points out that's a float, an array of 4 chars, an unsigned int or signed int, etc. For example when printing a char variable, depending on the type and output properties indicated, it will print out the character, unsigned integer or signed integer. It's the programmer's responsibility to show the compiler how to treat that value and then the compiler will emit the correct instruction needed to process the value.

A lot has been said about your first question, but I like to say something about your second one:

Who implements the two's-complement arithmetic - the CPU or the complier?

The C standard does not require negative numbers to have two's-complement, it doesn't define at all how hardware expresses negative numbers. The task of the compiler is it to translate your C code into CPU instructions that do what you your code requests. So whether the C compiler will create code for two's-complement arithmetic or not depends solely on the fact if your CPU uses two's-complement arithmetic or not. The compiler must know how the CPU works and create code accordingly. So the correct answer to that question is: The CPU.

If your CPU was using a one's-complement representation, than a C compiler for that CPU would emit one's-complement instructions. On the other hand, a C compiler can emulate support for negative numbers on a CPU that doesn't know negative numbers at all. As the two's-complement allows you to ignore if a number is signed or unsigned for many operations, this is not too hard to do. In that case it would have been the compiler implementing the two's-complement arithmetic. This could also be done on a CPU that has an representation for negative numbers, yet why should the compiler do that and not just use the native form the CPU understands? So it won't do that unless it has to.

This bothered me too for a long time. I didn't get to know how compiler works as a program while handing its defaults and implicit instructions. But my search for an answer led me to following conclusions :

Real world uses signed integers only, since the discovery of negative numbers. that is the reason int is considered a signed integer by default in the compiler. I totally ignore the unsigned number arithmetic since it is useless.

CPU has no clue of signed and unsigned integers. It just knows bits - 0 and 1. how you interpret its output is up to you as an assembly programmer. That makes assembly programming tedious. Dealing with integers (signed & unsigned) involved lot of flag-checking. That is why high-level languages were developed. compiler takes all the pain away.

How compiler works is a very advance learning. I accepted that at present it is beyond my understanding. This acceptance helped me to move on in my course.

In x86 architecture:

add and sub instructions modify flags in the eflags register. These flags can then be used in conjunction with adc and sbb instructions to build arithmetic with higher precision. In such case, we move the size of the numbers into ecx register. The number of times loop instruction is carried out is the same as the size of the numbers in bytes.

Sub instruction takes the 2's complement of the subtrahend, add it to the minuend, invert the carry. This is done in hardware (implemented in circuit). Sub instruction 'activates' a different circuit. After using the sub instruction, programmer or compiler checks the CF. If it is 0, the result is positive and the destination has correct result. If it is 1, the result is negative and the destination has the 2's complement of the result. Normally, the result is left in 2's complement and read as a signed number, but the NOT and INC instructions can be used to change it. The NOT instruction performs the 1's complement of the operand, then the operand is incremented to get the 2's complement.

When a programmer has planned to read the result of an add or sub instruction as a signed number, he shall be watch the OF flag. If it is set 1, the result is wrong. He should sign-extend the numbers before running the operation between them.

2's complement is just a map between decimal and binary number.

The compiler is implementing this mapping by translating literal number to corresponding binary, for example -3 to 0xFFFFFFFD(as can be seen in disassembly), and producing machine code that is consistent with 2's complement representation. For example, when it tries to execute 0-3, it should choose a instuction that should produce 0xFFFFFFFD by taking 0x00000000 and 0x000000003 as arguments.

The reason it chooses SUB which is same for unsigned subtraction, is it simply produces 0xFFFFFFFD as expected. No need to ask CPU to provide special SUB for signed subtraction. To say that second operand is inversed by 2's complement and by this deduce the conclusion that CPU implements 2's complement in SUB is unfair. Because borrowing from higher bit in subtraction happens to be the same as 2's complement inversing and SUB is also used for unsigned subtraction, no need to involve the concept of 2's complememt in SUB at all.

The following disassembly illustrate the fact that signed subtraction use same SUB as unsigned.

//int32_3 = -3;
010B2365  mov         dword ptr [int32_3],0FFFFFFFDh  
//int32_1 = 0, int32_2 = 3;
010B236C  mov         dword ptr [int32_1],0  
010B2373  mov         dword ptr [int32_2],3  
//uint32_1 = 0, uint32_2 = 3;
010B237A  mov         dword ptr [uint32_1],0  
010B2384  mov         dword ptr [uint32_2],3  
//int32_3 = int32_1 - int32_2;
010B238E  mov         eax,dword ptr [int32_1]  
010B2391  sub         eax,dword ptr [int32_2]  
010B2394  mov         dword ptr [int32_3],eax  
//uint32_3 = uint32_1 - uint32_2;
010B2397  mov         eax,dword ptr [uint32_1]  
010B239D  sub         eax,dword ptr [uint32_2]  
010B23A3  mov         dword ptr [uint32_3],eax  

The CPU preserve additional imformation in CF and OF flags for further instructions that use result of SUB in different ways depending on the variable type that is assigned the result.

The following disassembly illustrate how compiler generates different instructions for signed compare and unsigned compare. Notice cmp includes a internal sub , and jle is based on OF flag and jbe is based on CF flag.

//if (int32_3  > 1)  int32_3 = 0;
010B23A9  cmp         dword ptr [int32_3],1  
010B23AD  jle         main+76h (010B23B6h)  
010B23AF  mov         dword ptr [int32_3],0  
//if (uint32_3 > 1) uint32_3 = 0;
010B23B6  cmp         dword ptr [uint32_3],1  
010B23BD  jbe         main+89h (010B23C9h)  
010B23BF  mov         dword ptr [uint32_3],0 

It is the OF gives away the fact that CPU implements 2's complement, because the way the OF is set is when middle binary number 0x10000000 or 0x0FFFFFFF is exceeded. And 2's complement representation maps 0x10000000 to -268435456 and 0x0FFFFFFF to 268435455, which are the upper and lower limit of 32bit integer. So this OF flag is designed specifically for 2's complement, because other representation might choose to map other binary numbers to the upper and lower limit.

To conclude: 1. Compiler differentiates signed and unsigned arithmetics by implementing corresponding representations(mapping) and by generating instructions that the result of which comply with compiler's representation of signed and unsigned integer. 2. Compiler implements 2's complement representation and CPU also implement it to support compiler in generating arithmetic instructions that the result of which comply with 2's complement representatiom.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM