简体   繁体   中英

C++: Union vs Bitwise operators

I have two char 's and I want to "stitch" them bitwise together.
For example:

char c1 = 11; // 0000 1011
char c2 = 5;  // 0000 0101
short int si = stitch(c1, c2); // 0000 1011 0000 0101

So, what I first tried was with bitwise operators:

short int stitch(char c1, char c2)
{
    return (c1 << 8) | c2;
}

But this doesn't work: I'm getting a short equals to c2 ... (1) Why?
(But: c1 and c2 are negative numbers in my real app... maybe this is a part of the problem?)

So, my second solution was to use a union :

union stUnion
{
    struct
    {
         char c1;
         char c2;
    }
    short int si;
}

short int stitch(char c1, char c2)
{
    stUnion u;
    u.c1 = c1;
    u.c2 = c2;
    return u.si;
}

This works like I want... I think

(2) What is the best/fastest way?

Thanks!

The union method is implementation-defined at best (in practice, it will pretty reliably work but the format of si depends on the endianness of the platform).

The problem with the bitwise way is, as you suspect, the negative numbers. A negative number is represented by a chain of leading 1's. So -5 for example is

1111 1011

If you cast this to int or even unsigned int , it becomes

1111 1111 1111 … 1111 1011

and all those 1's will drown out the left-shifted data when OR is applied.

To solve the problem, cast the char s to unsigned char and then to int (to prevent overflow, or even the appearance of the possibility of overflow) before shifting:

short int stitch(char c1, char c2)
{
    return ( (int) (unsigned char) c1 << 8) | (unsigned char) c2;
}

or, if you are free to change the types of the arguments and you can include <cstdint> ,

uint16_t stitch( uint8_t c1, uint8_t c2)
{
    return ( (int) c1 << 8 ) | c2;
}

$5.8/1 states- "The operands shall be of integral or enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or reater than or equal to the length in bits of the promoted left operand."

So try to typecast c1 to unsigned int and then bitwise OR with C2. Also return the output as unsigned int. chars are promoted to int but we want to be 'unsigned int'

The reason is that c2 is first promoted to int before doing the bitwise OR, which causes sign extension to take place (assuming that char is signed and can hold negative values):

char x1 = -2; // 1111 1110
char x2 = -3; // 1111 1101

short int si = stitch(c1, c2); // 1111 1111 1111 1101

The representation of x2 promoted to int is (at least) 1 byte full of 1 , so it overwrites the zero bit of x1 that you previously shifted up. You can cast to unsigned char first. With two complement representation, that will not change the bitpattern in the lowest byte. While not strictly necessary, you may cast c1 to unsigned char too, for consistency (if short is 2 bytes long, it won't matter that c1 was sign extended beyond those 2 bytes)

short int stitch(char c1, char c2) {
    return ((unsigned char)c1 << 8) | (unsigned char)c2;
}

The shift/or method, once fixed, is cleaner as it does not depend on byte ordering.

Apart from that, the union method is likely slower on many modern CPUs due to a store-to-load-forwarding (STLF) problem. You are writing a value to memory and then reading it back as different data types. Many CPUs cannot send the data to the load quickly if this occurs. The load needs to wait until the store is entirely done (retired), writing its data to the L1 cache.

On very old CPUs without a barrel shifter (shifting by 8 needs 8 operations) and with simple in-order execution, such as the 68000, the union method may be faster.

You should not use an union for that. You should never use union fields simultaneously. If an union has member A and member B then you must consider that A and B are unrelated. That's because compiler is free to add padding anywhere it wants (except at the front of struct). Another problem is byte order (little/big endian).

//EDIT There is exception to above "union rule", you can use these members simultaneously which are at the fron and have the same layout. ie

union {
    struct {
        char c;
        int i;
        short s;
    } A;
    struct {
        char c;
        int i;
        char c1;
        char c2;
    } B;
};

Ac and Ai can be used simultaneously with Bc and Bi

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM