简体   繁体   中英

C++: Unsigned Char to unsigned int using pointers without bit shifts

Suppose I have a C-style array of type unsigned char :

unsigned char * c = (unsigned char *) malloc(5000 * sizeof(unsigned char));
for(int i = 0; i < 5000; i++) 
    c[i] = (unsigned char) ((i >> (i%4 * 8)) & 0xFF);

Suppose I have a pointer offset to a position which starts a 4 byte integer:

// pseudo code
unsigned int i = c + 10; // 10 = pointer offset, let's say. 

If I want to load i up with the correct number, I can do:

unsigned int i = (*(c+10) << 24) + (*(c+11) << 16) + (*(c+12) << 8) + (*(c+13));

But shouldn't I just be able to, somehow, do this using casts?

// pseudo code -- I haven't gotten this to work yet: 

int i = (unsigned int) (*((void *)(c+10));

// or maybe
int i = *((unsigned int*)((void *)(c+10)));

In short, what is the cleanest, most effective way to transition the four bytes to an unsigned int in a C-style byte array?

The proper way to do this is to use memcpy:

unsigned int i;
std::memcpy(&i, c + offset, sizeof(unsigned int));

On architectures that support unaligned variable access (like x86-64), this will be optimized into a simple pointer dereference, but on systems that don't support unaligned access (such as ARM), it will do the proper thing to get the value out.

See for example: https://gcc.godbolt.org/z/l5Px4G . Switch the compiler between gcc for x86 and arm and see the difference in instructions.

Keep in mind the idea of endianness if you're getting the data from some external source. You may have to flip the bytes of the integer around for the value to make sense.

No, you shouldn't. Adding an offset that's not a multiple of an object's size to a pointer to an allocated object can result in a pointer that the platform cannot dereference. It's simply not a pointer to an unsigned int .

On some platforms, performance will be atrocious. On some platforms, the code will fault.

In any event, the shifts and adds are very clear and easy to understand. The cast is more confusing and requires understanding the platform's byte ordering. So you're not making things better, simpler, or clearer.

But shouldn't I just be able to, somehow, do this using casts?

No, there's no cast that's guaranteed to work.


Note that there are many representations for an integer. How to convert an array of bytes to an integer object depends on how the integer is represented in the array. If an integer is converted to an array of bytes and sent over a network for example, you cannot know whether the receiving computer uses the same representation.

One consideration is how negative numbers are represented. Luckily 2's complement is such a ubiquitous representation that we can usually ignore this. In your case though, it's even less important since you're converting an unsigned integer.

A more relevant consideration is byte endianness.

If you know that the array is in the same representation as is used by the CPU that executes the program, then you could copy the bytes using std::memcpy :

unsigned int i;
static_assert(sizeof i == 4);
std::memcpy(&i, c + 10, sizeof i);

This works correctly regardless of the endianness used by the CPU, as long as the source data is in the same representation.


Your suggestion (*(c+10) << 24) + ... is correct (or appears to be, I didn't thoroughly check) if the representation of the byte array is big endian. The suggestion is wrong if the array is little or some other endianness.

This approach is useful when receiving data over the network, since it does not rely on the representation being same as the executing CPU.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM