简体   繁体   中英

Undefined high-order of uint64_t while shifting and masking 32-bit values

I have some undefined behaviour in a seemingly innocuous function which is parsing a double value from a buffer. I read the double in two halves, because I am reasonably certain the language standard says that shifting char values is only valid in a 32-bit context.

inline double ReadLittleEndianDouble( const unsigned char *buf )
{
    uint64_t lo = (buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0];
    uint64_t hi = (buf[7] << 24) | (buf[6] << 16) | (buf[5] << 8) | buf[4];
    uint64_t val = (hi << 32) | lo;
    return *(double*)&val;
}

Since I am storing 32-bit values into 64-bit variables lo and hi , I reasonably expect that the high-order 32-bits of these variables will always be 0x00000000 . But sometimes they contain 0xffffffff or other non-zero rubbish.

The fix is to mask it like this:

uint64_t val = ((hi & 0xffffffffULL) << 32) | (lo & 0xffffffffULL);

Alternatively, it seems to work if I mask during the assignment instead:

uint64_t lo = ((buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0]) & 0xffffffff;
uint64_t hi = ((buf[7] << 24) | (buf[6] << 16) | (buf[5] << 8) | buf[4]) & 0xffffffff;

I would like to know why this is necessary. All I can think of to explain this is that my compiler is doing all the shifting and combining for lo and hi directly on 64-bit registers, and I might expect undefined behaviour in the high-order 32-bits if this is the case.

Can someone please confirm my suspicions or otherwise explain what is happening here, and comment on which (if any) of my two solutions is preferable?

If you try to shift a char or unsigned char you're leaving yourself at the mercy of the standard integer promotions. You're better off casting the values yourself, before you try to shift them. You don't have to separate the lower and upper halves if you do so.

inline double ReadLittleEndianDouble( const unsigned char *buf )
{
    uint64_t val = ((uint64_t)buf[7] << 56) | ((uint64_t)buf[6] << 48) | ((uint64_t)buf[5] << 40) | ((uint64_t)buf[4] << 32) |
                   ((uint64_t)buf[3] << 24) | ((uint64_t)buf[2] << 16) | ((uint64_t)buf[1] << 8) | (uint64_t)buf[0];
    return *(double*)&val;
}

All this is necessary only if the CPU is big-endian or if the buffer might not be properly aligned for the CPU architecture, otherwise you can simplify this greatly:

    return *(double*)buf;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM