简体   繁体   中英

24-bit to 32-bit conversion in C++

I need to convert a 24-bit integer (2s compliment) to 32-bit integer in C++. I have found a solution here , which is given as

int interpret24bitAsInt32(unsigned char* byteArray)
 {     
    return (  
        (byteArray[0] << 24)
    |   (byteArray[1] << 16)
    |   (byteArray[2] << 8)
    ) >> 8;  
}

Though I found it is working, I have the following concern about the piece of code. byteArray[0] is only 8-bits, and hence how the operations like byteArray[0] << 24 will be possible? It will be possible if the compiler up-converts the byteArray to an integer and does the operation. This may be the reason it is working now. But my question is whether this behaviour is guaranteed in all compilers and explicitly mentioned in the standard? It is not trivial to me as we are not explicitly giving the compiler any clue that the target is a 32-bit integer!

Also, please let me know any improvisation like vectorization is possible to improve the speed (may be using C++11), as I need to convert huge amount of 24-bit data to 32-bit.

int32_t interpret24bitAsInt32(unsigned char* byteArray)
{     
    int32_t number =
        (((int32_t)byteArray[0]) << 16)
    |   (((int32_t)byteArray[1]) << 8)
    |   byteArray[2];
    if (number >= ((int32_t)1) << 23)
        //return (uint32_t)number | 0xFF000000u;
        return number - 16777216;
    return number;
}

this function should do what you want without invoking undefined behavior by shifting a 1 into the sign bit of int .
The int32_t cast is only necessary if sizeof(int) < 4 , otherwise the default integer promotion to int happens.

If someone does not like the if : It does not get translated to a conditional jump by the compiler (gcc 9.2): https://godbolt.org/z/JDnJM2
It leaves a cmovg .

Integral promotions [conv.prom] are performed on the operands of a shift expression [expr.shift]/1 . In your case, that means that your values of type unsigned char will be converted to type int before << is applied [conv.prom]/1 . Thus, the C++ standard guarantees that the operands be "up-converted".

However, the standard only guarantees that int has at least 16 Bit. There is also no guarantee that unsigned char has exactly 8 Bit (it may have more). Thus, it is not guaranteed that int is always large enough to represent the result of these left shifts. If int does not happen to be large enough, the resulting signed integer overflow will invoke undefined behavior [expr]/4 . Chances are that int has 32 Bit on your target platform and, thus, everything works out in the end.

If you need to work with a guaranteed, fixed number of Bits, I would generally recommend to use fixed-width integer types , for example:

std::int32_t interpret24bitAsInt32(const std::uint8_t* byteArray)
{     
    return
        static_cast<std::int32_t>(
            (std::uint32_t(byteArray[0]) << 24) | 
            (std::uint32_t(byteArray[1]) << 16) | 
            (std::uint32_t(byteArray[2]) <<  8)
        ) >> 8;
}

Note that right shift of a negative value is currently implementation-defined [expr.shift]/3 . Thus, it is not strictly guaranteed that this code will end up performing sign extension on a negative number. However, your compiler is required to document what exactly right-shifting a negative integer does [defns.impl.defined] (ie, you can go and make sure it does what you need). And I have never heard of a compiler that does not implement right shift of a negative value as an arithmetic shift in practice. Also, it looks like C++20 is going to mandate arithmetic shift behavior…

[expr.shift]/1 The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand...

[conv.prom] 7.6 Integral promotions

1 A prvalue of an integer type other than bool , char16_t , char32_t , or wchar_t whose integer conversion rank (7.15) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int .

So yes, the standard requires that an argument of a shift operator, that has the type unsigned char , be promoted to int before the evaluation.


That said, the technique in your code relies on int a) being 32 bits large, and b) using two's-complement to represent negative values. Neither of which is guaranteed by the standard, though it's common with modern systems.

A version without branch; but multiplication:

int32_t interpret24bitAsInt32(unsigned char* bytes) {
  unsigned char msb = UINT8_C(0xFF) * (bytes[0] >> UINT8_C(7));
  uint32_t number =
        (msb << UINT32_C(24))
      | (bytes[0] << UINT32_C(16)))
      | (bytes[1] << UINT32_C(8)))
      |  bytes[2];
  return number;
}

You need to test if omitting the branch really gives you a performance advantage, though!

Adapted from older code of me which did this for 10 bit numbers. Test before use!

Oh, and it still relies upon implementation defined behaviour with regards to the conversion uint32_t to int32_t . If you want to go down that rabbit hole, have fun but be warned .

Or, much more simple: Use the trick from mchs answer . And also use shifts instead of multiplication:

int32_t interpret24bitAsInt32(unsigned char* bytes) {
  int32_t const number =
        (bytes[0] << INT32_C(16))
      | (bytes[1] << INT32_C(8))
      |  bytes[2];
  int32_t const correction = 
     (bytes[0] >> UINT8_C(7)) << INT32_C(24);
  return number - correction;
}

Test case

There is indeed Integral_promotion for type smaller than int for operator_arithmetic

So assuming sizeof(char) < sizeof(int)

in

byteArray[0] << 24

byteArray is promoted in int and you do bit-shift on int .

First issue is that int can only be 16 bits.

Second issue (before C++20), int is signed , and Bitwise shift can easily lead to implementation-defined or UB (And you have both for negative 24 bits numbers).

In C++20, behavior of Bitwise shift has been simplified (behavior defined) and the problematic UB has been removed too.

The leading 1 of negative number are kept in neg >> 8 .

So before C++20, you have to do something like:

std::int32_t interpret24bitAsInt32(const unsigned char* byteArray)
{
    const std::int32_t res =
        (std::int32_t(byteArray[0]) << 16)
      | (byteArray[1] << 8)
      | byteArray[2];
    const std::int32_t int24Max = (std::int32_t(1) << 24) - 1;
    return res <= int24Max ?
               res : // Positive 24 bit numbers
               int24Max - res; // Negative number
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM