简体   繁体   中英

c: type casting char values into unsigned short

starting with a pseudo-code snippet:

char a = 0x80;
unsigned short b;
b = (unsigned short)a;
printf ("0x%04x\r\n", b); // => 0xff80

to my current understanding "char" is by definition neither a signed char nor an unsigned char but sort of a third type of signedness.

how does it come that it happens that 'a' is first sign extended from (maybe platform dependent) an 8 bits storage to (a maybe again platform specific) 16 bits of a signed short and then converted to an unsigned short?

is there ac standard that determines the order of expansion?

does this standard guide in any way on how to deal with those third type of signedness that a "pure" char (i called it once an X-char, x for undetermined signedness) so that results are at least deterministic?

PS: if inserting an "(unsigned char)" statement in front of the 'a' in the assignment line, then the result in the printing line is indeed changed to 0x0080. thus only two type casts in a row will provide what might be the intended result for certain intentions.

The type char is not a "third" signedness. It is either signed char or unsigned char , and which one it is is implementation defined.

This is dictated by section 6.2.5p15 of the C standard :

The three types char , signed char , and unsigned char are collectively called the character types . The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char .

It appears that on your implementation, char is the same as signed char , so because the value is negative and because the destination type is unsigned it must be converted.

Section 6.3.1.3 dictates how conversion between integer types occur:

1 When a value with integer type is converted to another integer type other than _Bool ,if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Since the value 0x80 == -128 cannot be represented in an unsigned short the conversion in paragraph 2 occurs.

char has implementation-defined signedness. It is either signed or unsigned, depending on compiler. It is true, in a way, that char is a third character type, see this . char has an indeterministic (non-portable) signedness and therefore should never be used for storing raw numbers.

But that doesn't matter in this case.

  • On your compiler, char is signed.
  • char a = 0x80; forces a conversion from the type of 0x80 , which is int , to char , in a compiler-specific manner. Normally on 2's complement systems, that will mean that the char gets the value -128 , as seems to be the case here.
  • b = (unsigned short)a; forces a conversion from char to unsigned short 1) . C17 6.3.1.3 Signed and unsigned integers then says:

    Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

    One more than the maximum value would be 65536 . So you can think of this as -128 + 65536 = 65408 .

  • The unsigned hex representation of 65408 is 0xFF80 . No sign extension takes place anywhere!


1) The cast is not needed. When both operands of = are arithmetic types, as in this case, the right operand is implicitly converted to the type of the right operand (C17 6.5.16.1 §2).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM