简体   繁体   中英

Incorrect hex representations of characters with char but correct with unsigned char

I was writing a function that prints the "hexdump" of a given file. The function is as stated below:

bool printhexdump (FILE *fp) {
    long unsigned int filesize = 0;
    char c;

    if (fp == NULL) {
        return false;
    }

    while (! feof (fp)) {
        c = fgetc (fp);
        if (filesize % 16 == 0) {
            if (filesize >= 16) {
                printf ("\n");
            }
            printf ("%08lx  ", filesize);
        }
        printf ("%02hx ", c);
        filesize++;
    }

    printf ("\n");
    return true;
}

However, on certain files, certain invalid integer representations seem to be get printed, for example:

00000000  4d 5a ff90 00 03 00 00 00 04 00 00 00 ffff ffff 00 00
00000010  ffb8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030  00 00 00 00 00 00 00 00 00 00 00 00 ff80 00 00 00
00000040  ffff

Except for the last ffff caused due to the EOF character, the ff90 , ffff , ffb8 etc. are wrong. However, if I change char to unsigned char , I get the correct representation:

00000000  4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00
00000010  b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030  00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00
00000040  ff

Why would the above behaviour happen?

Edit : the treatment of c by printf() should be the same since the format specifiers don't change. So I'm not sure how char would get sign extended while unsigned char won't?

Q: the treatment of c by printf() should be the same since the format specifiers don't change.
A: OP is correct, the treatment of c by printf() did not change. What changed was what was passed to printf() . As char or unsigned char , c goes through the usual integer promotions typically to int . char , if signed, gets a sign extension. A char value like 0xFF is -1. An unsigned char value like 0xFF remains 255.

Q: So I'm not sure how char would get sign extended while unsigned char won't?
A: They both got a sign extension. char may be negative, so its sign extension may be 0 or 1 bits. unsigned char is always positive, so its sign extension is 0 bits.


Solution

char c;
printf ("%02x ", (unsigned char) c);
// or
printf ("%02hhx ", c);

// or
unsigned char c;
printf ("%02x ", c);
// or
printf ("%02hhx ", c);

char can be a signed type, and in that case values 0x80 to 0xff get sign-extended before being passed to printf .

(char)0x80 is sign-extended to -128, which in unsigned short is 0xff80.

[edit] To be clearer about promotion; the value stored in a char is eight bits, and in that eight-bit representation a value like 0x90 will represent either -112 or 114, depending on whether the char is signed or unsigned. This is because the most significant bit is taken as the sign bit for signed types, and a magnitude bit for unsigned types. If that bit is set, it either makes the value negative (by subtracting 128) or it makes it larger (by adding 128) depending on the whether or not it's a signed type.

The promotion from char to int will always happen, but if char is signed then converting it to int requires that the sign bit be unrolled up to the sign bit of the int so that the int represents the same value as the char did.

Then printf gets ahold of it, but that doesn't know whether the original type was signed or unsigned, and it doesn't know that it used to be a char. What it does know is that the format specifier is for an unsigned hexadecimal short, so it prints that number as if it were unsigned short. The bit pattern for -112 in a 16-bit int is 1111111110010000 , formatted as hex, that's ff90.

If your char is unsigned then 0x90 does not represent a negative value, and when you convert it to an int nothing needs to be changed in the int to make it represent the same value. The rest of the bit pattern is all zeroes and printf doesn't need those to display the number correctly.

The problem is simply caused by the format. %h02x takes an int. When you take a character below 128, all is fine it is positive and will not change when converted to an int.

Now, let's take a char above 128, say 0x90 . As an unsigned char, its value is 144, it will be converted to an int value of 144, and be printed at 90 . But as a signed char, its value is -112 (still 0x90) it will be converted to an int of value -112 (0xff90 for a 16 bits int) and be printed as ff90 .

Because in unsigned char the most significant bit has a different meaning than that of signed char .

For example, 0x90 in binary is 10010000 which is 144 decimal, unsigned, but signed it is -16 decimal.

Whether or not char is signed is platform-dependant. This means that the sign bit may or may not be extended depending on your machine, and thus you can get different results.

However, using unsigned char ensures that there is no sign extension (because there is no sign bit anymore).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM