I was writing a function that prints the "hexdump" of a given file. The function is as stated below:
bool printhexdump (FILE *fp) {
long unsigned int filesize = 0;
char c;
if (fp == NULL) {
return false;
}
while (! feof (fp)) {
c = fgetc (fp);
if (filesize % 16 == 0) {
if (filesize >= 16) {
printf ("\n");
}
printf ("%08lx ", filesize);
}
printf ("%02hx ", c);
filesize++;
}
printf ("\n");
return true;
}
However, on certain files, certain invalid integer representations seem to be get printed, for example:
00000000 4d 5a ff90 00 03 00 00 00 04 00 00 00 ffff ffff 00 00
00000010 ffb8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 ff80 00 00 00
00000040 ffff
Except for the last ffff
caused due to the EOF
character, the ff90
, ffff
, ffb8
etc. are wrong. However, if I change char
to unsigned char
, I get the correct representation:
00000000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00
00000010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00
00000040 ff
Why would the above behaviour happen?
Edit : the treatment of c
by printf()
should be the same since the format specifiers don't change. So I'm not sure how char
would get sign extended while unsigned char
won't?
Q: the treatment of c
by printf()
should be the same since the format specifiers don't change.
A: OP is correct, the treatment of c
by printf()
did not change. What changed was what was passed to printf()
. As char
or unsigned char
, c
goes through the usual integer promotions typically to int
. char
, if signed, gets a sign extension. A char
value like 0xFF is -1. An unsigned char
value like 0xFF remains 255.
Q: So I'm not sure how char
would get sign extended while unsigned char
won't?
A: They both got a sign extension. char
may be negative, so its sign extension may be 0
or 1
bits. unsigned char
is always positive, so its sign extension is 0
bits.
Solution
char c;
printf ("%02x ", (unsigned char) c);
// or
printf ("%02hhx ", c);
// or
unsigned char c;
printf ("%02x ", c);
// or
printf ("%02hhx ", c);
char
can be a signed type, and in that case values 0x80
to 0xff
get sign-extended before being passed to printf
.
(char)0x80 is sign-extended to -128, which in unsigned short is 0xff80.
[edit] To be clearer about promotion; the value stored in a char is eight bits, and in that eight-bit representation a value like 0x90 will represent either -112 or 114, depending on whether the char is signed or unsigned. This is because the most significant bit is taken as the sign bit for signed types, and a magnitude bit for unsigned types. If that bit is set, it either makes the value negative (by subtracting 128) or it makes it larger (by adding 128) depending on the whether or not it's a signed type.
The promotion from char to int will always happen, but if char is signed then converting it to int requires that the sign bit be unrolled up to the sign bit of the int so that the int represents the same value as the char did.
Then printf
gets ahold of it, but that doesn't know whether the original type was signed or unsigned, and it doesn't know that it used to be a char. What it does know is that the format specifier is for an unsigned hexadecimal short, so it prints that number as if it were unsigned short. The bit pattern for -112 in a 16-bit int
is 1111111110010000
, formatted as hex, that's ff90.
If your char is unsigned then 0x90 does not represent a negative value, and when you convert it to an int nothing needs to be changed in the int to make it represent the same value. The rest of the bit pattern is all zeroes and printf
doesn't need those to display the number correctly.
The problem is simply caused by the format. %h02x
takes an int. When you take a character below 128, all is fine it is positive and will not change when converted to an int.
Now, let's take a char above 128, say 0x90
. As an unsigned char, its value is 144, it will be converted to an int value of 144, and be printed at 90
. But as a signed char, its value is -112 (still 0x90) it will be converted to an int of value -112 (0xff90 for a 16 bits int) and be printed as ff90
.
Because in unsigned char
the most significant bit has a different meaning than that of signed char
.
For example, 0x90
in binary is 10010000
which is 144
decimal, unsigned, but signed it is -16
decimal.
Whether or not char
is signed is platform-dependant. This means that the sign bit may or may not be extended depending on your machine, and thus you can get different results.
However, using unsigned char
ensures that there is no sign extension (because there is no sign bit anymore).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.