I tried to convert a byte array to a long
long readAndSkipLong(char*& b)
{
unsigned long ret = (b[0] << 56) | (b[1] << 48) | (b[2] << 40) | (b[3]<<32) | (b[4] << 24) | (b[5] << 16) | (b[6] << 8) | (b[7]);
return ret;
}
My shifting seems to be not right. For the intended value
152 --> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 10011000
I get:
-104 --> 11111111 11111111 11111111 11111111 11111111 11111111 11111111 10011000
Any idea where the bug is?
It's because of type promotion and sign extension. Every value in your char
array is signed, and bit-shifting is an integer operation. When you use the shifting operator, it evaluates to an int
, and because your char
s are signed, shifting them would produce signed int
s.
The last (rightmost) byte has 1
as the sign bit. When promoted to an int
, its value becomes -104
by sign extension. As you ORed the rest of the numbers, all the 1
bits remained unaffected.
To avoid this problem, you can cast each char
s to unsigned long
before shifting and ORing.
Another thing you can do is bitwise ANDing each char
with 0xff
like ((b[i] & 0xff) << 24)
. ANDing with 0xff
would produce an int
, keeping the least significant 8 bits intact and zeroes to the left, no sign extension.
2 things:
char
can be signed or unsigned , hence shouldn't be used storing for data types other than characters.
In C, C++ and most C-like languages, any types narrower than int
must be promoted to int
in an expression, and your statement will be treated like this
unsigned long ret = ((int)b[0] << 56) | ((int)b[1] << 48) | ((int)b[2] << 40) | ((int)b[3] << 32) | ((int)b[4] << 24) | ((int)b[5] << 16) | ((int)b[6] << 8) | ((int)b[7]);
If char
is signed, it'll be promoted to int
using sign extension . As a result the top bits will be filled with 1s if the byte value is negative.
In MSVC char
is signed by default. You can use /J
to make char unsigned, which will solve part of your problem. But then another problem arises:
In Windows long
is a 32-bit type , consequently you can't pack 8 bytes into it . Moreover int
is also 32-bit on most modern systems, and after promoting b[i]
to int shifting more than 31 is undefined behavior which is what your program does.
So to fix all the problems portably you need to:
b[i]
to unsigned char
or uint8_t
, or masking the high bits by ANDing it with 0xFF like 0605002 suggested. Or simply change the type of b
to unsigned char&*
instead of char&*
(unsigned) long long
, (u)int64_t
or (u)int_least64_t
The result may look like this
uint64_t readAndSkipLong(unsigned char*& b)
{
return ((uint64_t)b[0] << 56) | ((uint64_t)b[1] << 48)
| ((uint64_t)b[2] << 40) | ((uint64_t)b[3] << 32)
| ((uint64_t)b[4] << 24) | ((uint64_t)b[5] << 16)
| ((uint64_t)b[6] << 8) | ((uint64_t)b[7]);
}
or
uint64_t readAndSkipLong(char*& b)
{
return ((uint64_t)(uint8_t)b[0] << 56) | ((uint64_t)(uint8_t)b[1] << 48)
| ((uint64_t)(uint8_t)b[2] << 40) | ((uint64_t)(uint8_t)b[3] << 32)
| ((uint64_t)(uint8_t)b[4] << 24) | ((uint64_t)(uint8_t)b[5] << 16)
| ((uint64_t)(uint8_t)b[6] << 8) | ((uint64_t)(uint8_t)b[7]);
}
However you don't actually need to write a function to reverse endian. There are already ntohll()
and htonll()
for that purpose
reversedEndian = ntohll(originalValue);
If the input must be a char
array then just copy the value to a uint64_t
memcpy(&originalValue, &b, sizeof originalValue);
reversedEndian = ntohll(originalValue);
You can further reduce the whole thing to reversedEndian = ntohll(*(int64_t*)&b);
if strict aliasing is allowed because on x86 unaligned access is generally permitted
Couple of things to think about
cstdint
and use std::uint64_t
and std::uint8_t
for you types so that there is no issues with sign.uint64_t
and use that. Here's a bit of code I wrote for bytes to uint64_t
on a little endian machine.
std::uint64_t bytesToUint64(std::uint8_t* b) {
std::uint64_t msb = 0x0u;
for (int i(0); i < 7; i++) {
msb |= b[i];
msb <<= 8;
}
msb |= b[7];
return msb;
}
EDIT by OP (implemented Tip 1):
long readAndSkipLong(char*& b)
{
std::uint64_t ret =
((std::uint8_t)b[0] << 56) |
((std::uint8_t)b[1] << 48) |
((std::uint8_t)b[2] << 40) |
((std::uint8_t)b[3] << 32) |
((std::uint8_t)b[4] << 24) |
((std::uint8_t)b[5] << 16) |
((std::uint8_t)b[6] << 8) |
((std::uint8_t)b[7]);
b+=8;
return ret;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.