简体   繁体   中英

How does C++ look at a pointer to an unsigned char when it's treated like an array?

I'm trying to decipher some code, and it seems to be treating values in sequential memory addresses backwards from what I expected. A 64-bit signed integer is being cast as an 8-bit unsigned char Here's a simplified version of it:

unsigned char* ucMyChar;
unsigned __int64 ui64MyInt;
CString strMyString;

//some code that assigns a value to ui64MyInt

ucMyChar = (unsigned char*)&ui64MyInt;

strMyString.Format("%02x%02x%02x%02x-%02x%02x-1%01x%02x",
                    ucMyChar[3], ucMyChar[2], ucMyChar[1], ucMyChar[0],
                    ucMyChar[5], ucMyChar[4], ucMyChar[7], ucMyChar[6]);

If the value of ui64MyInt was:

0x010203040a0b0c0d

Which of the following would be the correctly formatted string?

04030201-0b0a-1d0c

or

0a0b0c0d-0304-1102

The reason I'm asking is because I've got a value and I'm trying to run the math in this code backwards, because some needed information is included in the original values used to generate this string, and there's no other way to recover said information because of file corruption. So far the values I'm coming up with using the first string seem to be way out of the expected range, and I'm not sure if I'm making math errors or if I don't understand the way unsigned char pointers work.

It's implementation-defined in which order the bits of an integer are stored in memory. (That means that the compiler gets to decide, and almost certainly it bases it on decisions made by the CPU for how the CPU stores an integer in memory).

The two most common layouts are (lowest address first)

  • 01 02 03 04 0a 0b 0c 0d (typical example: ARM)
  • 0d 0c 0b 0a 04 03 02 01 (typical example: x86/x64)

Other layouts are also possible. For example, if compiler for a 32-bit CPU rolled its own support for __int64 by placing two 32-bit ints next to each other, it might even go:

  • 04 03 02 01 0d 0c 0b 0a

The C and C++ languages are carefully crafted so that this detail is not significant; you can write your code so that it works the same regardless of which representation is in use.

When someone writes code like:

ucMyChar = (unsigned char*)&ui64MyInt;

they are consciously bypassing C++'s facilities to behave independently of the integer representation. (A cast is a good sign that some bypassing of the type system is going on!)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM