简体   繁体   中英

C cast and char signedness

So lately, I read on an issue regarding the three distinct types in C, char/unsigned char/signed char. The problem that I now encounter is not something I have experienced up till now (my program works correctly on all tested computers and only targets little-endian (basically all modern desktops and servers using Windows/Linux right?). I frequently reuse a char array I defined for holding a "string" (not a real string of course) as temporary variables. Eg instead of adding another char to the stack I just reuse one of the members like array[0]. However, I based this tactic on the fact that a char would always be signed, until I read today that it actually depends on the implementation. What will happen if I now have a char and I assign a negative value to it?

char unknownsignedness = -1;

If I wrote

unsigned char A = -1;

I think that the C-style cast will simply reinterpret the bits and the value that A represents as an unsigned type becomes different. Am I right that these C-Style casts are simply reinterpretation of bits? I am now referring to signed <-> unsigned conversions.

So if an implementation has char as unsigned, would my program stop working as intended? Take the last variable, if I now do

if (A == -1)

I am now comparing a unsigned char to a signed char value, so will this simply compare the bits not caring about the signedness or will this return false because obviously A cannot be -1? I am confused what happens in this case. This is also my greatest concern, as I use chars like this frequently.

The following code prints No :

#include <stdio.h>

int
main()
{
    unsigned char a;

    a = -1;

    if(a == -1)
        printf("Yes\n");
    else
        printf("No\n");

    return 0;
}

The code a = -1 assigns an implementation-defined value to a ; on most machines, a will be 255. The test a == -1 compares an unsigned char to an int , so the usual promotion rules apply; hence, it is interpreted as

`(int)a == -1`

Since a is 255, (int)a is still 255, and the test yields false.

unsigned char a = -1;

ISO/IEC 9899:1999 says in 6.3.1.3/2:

if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type

We add (UCHAR_MAX+1) to -1 once, and the result is UCHAR_MAX , which is obviously in range for unsigned char .

if (a == -1)

There's a long passage in 6.3.1.8/1:

If both operands have the same type, then no further conversion is needed.

Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.

Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.

Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

The rank of unsigned char is less than that of int .

If int can represent all the values that unsigned char can (which is usually the case), then both operands are converted to int , and the comparison returns false .

If int cannot represent all values in unsigned char , which can happen on rare machines with sizeof(int)==sizeof(char) , then both are converted to unsigned int , -1 gets converted to UINT_MAX which happens to be the same as UCHAR_MAX , and the comparison returns true .

unsigned char A = -1;

results in 255. There is no reinterpretation upon assignment or initialization. A -1 is just a bunch of 1 bits in two's complement notation and 8 of them are copied verbatim.

Comparisons are a bit different, as the literal -1 is of int type.

if (A == -1)

will do a promotion (implicit cast) (int)A before comparison, so you end up comparing 255 with -1. Not equal.

And yes, you have to be careful with plain char .

I think this question is best answered by a quick example (warning: C++, but see explanation for my reasoning):

char c = -1;
unsigned char u = -1;
signed char s = -1;
if (c == u)
        printf("c == u\n");
if (s == u)
        printf("s == u\n");
if (s == c)
        printf("s == c\n");
if (static_cast<unsigned char>(s) == u)
        printf("(unsigned char)s == u\n");
if (c == static_cast<char>(u))
        printf("c == (char)u\n");

The output:

s == c
(unsigned char)s == u
c == (char)u

C is treating the values differently when used as-is, but you are correct in that casting will just reinterpret the bits. I used a C++ static_cast here instead to show that the compiler is okay with doing this casting. In C, you would just cast by prefixing the type in parenthesis. There is no compiler checking to ensure that the cast is safe in C.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM