I have two char
's and I want to "stitch" them bitwise together.
For example:
char c1 = 11; // 0000 1011
char c2 = 5; // 0000 0101
short int si = stitch(c1, c2); // 0000 1011 0000 0101
So, what I first tried was with bitwise operators:
short int stitch(char c1, char c2)
{
return (c1 << 8) | c2;
}
But this doesn't work: I'm getting a short
equals to c2
... (1) Why?
(But: c1
and c2
are negative numbers in my real app... maybe this is a part of the problem?)
So, my second solution was to use a union
:
union stUnion
{
struct
{
char c1;
char c2;
}
short int si;
}
short int stitch(char c1, char c2)
{
stUnion u;
u.c1 = c1;
u.c2 = c2;
return u.si;
}
This works like I want... I think
(2) What is the best/fastest way?
Thanks!
The union
method is implementation-defined at best (in practice, it will pretty reliably work but the format of si
depends on the endianness of the platform).
The problem with the bitwise way is, as you suspect, the negative numbers. A negative number is represented by a chain of leading 1's. So -5 for example is
1111 1011
If you cast this to int
or even unsigned int
, it becomes
1111 1111 1111 … 1111 1011
and all those 1's will drown out the left-shifted data when OR is applied.
To solve the problem, cast the char
s to unsigned char
and then to int
(to prevent overflow, or even the appearance of the possibility of overflow) before shifting:
short int stitch(char c1, char c2)
{
return ( (int) (unsigned char) c1 << 8) | (unsigned char) c2;
}
or, if you are free to change the types of the arguments and you can include <cstdint>
,
uint16_t stitch( uint8_t c1, uint8_t c2)
{
return ( (int) c1 << 8 ) | c2;
}
$5.8/1 states- "The operands shall be of integral or enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or reater than or equal to the length in bits of the promoted left operand."
So try to typecast c1 to unsigned int and then bitwise OR with C2. Also return the output as unsigned int. chars are promoted to int but we want to be 'unsigned int'
The reason is that c2
is first promoted to int
before doing the bitwise OR, which causes sign extension to take place (assuming that char is signed and can hold negative values):
char x1 = -2; // 1111 1110
char x2 = -3; // 1111 1101
short int si = stitch(c1, c2); // 1111 1111 1111 1101
The representation of x2
promoted to int
is (at least) 1 byte full of 1
, so it overwrites the zero bit of x1
that you previously shifted up. You can cast to unsigned char
first. With two complement representation, that will not change the bitpattern in the lowest byte. While not strictly necessary, you may cast c1
to unsigned char
too, for consistency (if short is 2 bytes long, it won't matter that c1
was sign extended beyond those 2 bytes)
short int stitch(char c1, char c2) {
return ((unsigned char)c1 << 8) | (unsigned char)c2;
}
The shift/or method, once fixed, is cleaner as it does not depend on byte ordering.
Apart from that, the union method is likely slower on many modern CPUs due to a store-to-load-forwarding (STLF) problem. You are writing a value to memory and then reading it back as different data types. Many CPUs cannot send the data to the load quickly if this occurs. The load needs to wait until the store is entirely done (retired), writing its data to the L1 cache.
On very old CPUs without a barrel shifter (shifting by 8 needs 8 operations) and with simple in-order execution, such as the 68000, the union method may be faster.
You should not use an union for that. You should never use union fields simultaneously. If an union has member A and member B then you must consider that A and B are unrelated. That's because compiler is free to add padding anywhere it wants (except at the front of struct). Another problem is byte order (little/big endian).
//EDIT There is exception to above "union rule", you can use these members simultaneously which are at the fron and have the same layout. ie
union {
struct {
char c;
int i;
short s;
} A;
struct {
char c;
int i;
char c1;
char c2;
} B;
};
Ac and Ai can be used simultaneously with Bc and Bi
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.