c++ combining 2 uint8_t into one uint16_t not working?

Question

So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them. The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one. However, this is in my code not the case. This is my code:

uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);

It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256. Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.

Can anyone explain that to me?

Thanks!

Answer 1

There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.

I expect that value to be 1 since x86 is little endian. However it returns 256.

Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.

Does endianess also affect the order of the bit's in the byte or not?

There is no way to access a bit by "address" ¹ , so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.

¹ You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields ie whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.

Here is a correct way to set two octets as uint16_t . The result will depend on endianness of the system:

// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;

Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.

In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.

Answer 2

In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:

uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);

Produces the out = 1 result you want.

However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:

uint16_t out = lowByte + (highByte << 8);

That will work on any machine, regardless of endianness.

Edit: Bit shifting explanation added.

x << y means to shift the bits in x y places to the left ( >> moves them to the right instead).

If X contains the bit-pattern xxxxxxxx , and Y contains the bit-pattern yyyyyyyy , then (X << 8) produces the pattern: xxxxxxxx00000000 , and Y + (X << 8) produces: xxxxxxxxyyyyyyyy .

(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy , etc.)

A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256 . That means that you can also do: Y + (X*256) + (Z*65536) , but I think the shifts are clearer and show the intent better.

Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.

You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation . Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.

Answer 3

If p is a pointer to some multi-byte value, then:

"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.

Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:

bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15

Answer 4

Your code works but your misinterpreted how to read "bytes"

#include <cstdint>
#include <cstddef>
#include <iostream>

int main()
{
    uint8_t *in = new uint8_t[2];
    in[0] = 3;
    in[1] = 1;
    uint16_t out = *((uint16_t*)in);

    std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;

    return 0;
}

By the way, you should take care of alignment when casting this way.

Answer 5

One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.

For ex.

(u)int32:  MSB:Bit 31 ...  LSB: Bit 0
(u)int16:  MSB:Bit 15 ...  LSB: Bit 0
(u)int8 :  MSB:Bit  7 ...  LSB: Bit 0

with your cast to a 16Bit value the Bytes will arrange like this

16Bit                <=  8Bit       8Bit
MSB     ...    LSB       BYTE[1]    BYTE[0]
Bit15          Bit0      Bit7 .. 0  Bit7 .. 0
0000 0001 0000 0000      0000 0001  0000 0000

which is 256 -> correct value.

c++ combining 2 uint8_t into one uint16_t not working?

Question

5 answers

solution1
4 ACCPTED 2019-07-25 11:47:31

solution2
3 2019-07-25 11:47:01

Edit: Bit shifting explanation added.

solution3
2 2019-07-25 11:39:20

solution4
1 2019-07-25 11:51:02

solution5
0 2019-07-25 12:20:00

c++ combining 2 uint8_t into one uint16_t not working?

Question

5 answers

solution1 4 ACCPTED 2019-07-25 11:47:31

solution2 3 2019-07-25 11:47:01

Edit: Bit shifting explanation added.

solution3 2 2019-07-25 11:39:20

solution4 1 2019-07-25 11:51:02

solution5 0 2019-07-25 12:20:00

solution1
4 ACCPTED 2019-07-25 11:47:31

solution2
3 2019-07-25 11:47:01

solution3
2 2019-07-25 11:39:20

solution4
1 2019-07-25 11:51:02

solution5
0 2019-07-25 12:20:00