简体   繁体   中英

Converting byte array (char array) to an integer type (short, int, long)

I was wondering if system endianness matters when converting a byte array to a short / int / long. Would this be incorrect to do if the code runs on both big-endian and little-endian machines?

short s = (b[0] << 8) | (b[1]);
int i = (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | (b[3])

Yes, endianness matters. In little endian you have the most significant byte in the upper part of the short or int - ie bits 8-15 for short and 24-31 for int. For big endian the byte order would need to be reversed:

short s = ((b[1] << 8) | b[0]);
int i = (b[3] << 24) | (b[2] << 16) | (b[1] << 8) | (b[0]);

Note that this assumes that the byte array is in little endian order. Endianness and conversion between byte array and integer types depends not only on the endianness of the CPU but also on the endianness of the byte array data.

It is recommended to wrap these conversions in functions that will know (either via compilation flags or at run time) the endianness of the system and perform the conversion correctly.

In addition, creating a standard for the byte array data (always big endian, for example) and then using the socket ntoh_s and ntoh_l will offload the decision regarding endianness to the OS socket implementation that is aware of such things. Note that the default network order is big endian (the n in ntoh_x ), so having the byte array data as big endian would be the most straight forward way to do this.

As pointed out by the OP (@Mike), boost also provides endianness conversion functions.

// on little endian:

unsigned char c[] = { 1, 0 };       // "one" in little endian order { LSB, MSB }

int a = (c[1] << 8) | c[0];         // a = 1

//----------------------------------------------------------------------------

// on big endian:

unsigned char c[] = { 0, 1 };       // "one" in big endian order { MSB, LSB }

int a = (c[0] << 8) | c[1];         // a = 1

//----------------------------------------------------------------------------

// on little endian:

unsigned char c[] = { 0, 1 };       // "one" in big endian order { MSB, LSB }

int a = (c[0] << 8) | c[1];         // a = 1 (reverse byte order)

//----------------------------------------------------------------------------

// on big endian:

unsigned char c[] = { 1, 0 };       // "one" in little endian order { LSB, MSB }

int a = (c[1] << 8) | c[0];         // a = 1 (reverse byte order)

You can use unions for this. Endianness matters, to change it you can use x86 BSWAP instruction (or analogues for another platforms), provided by the most of c compilers as an intrinsic.

#include <stdio.h>
typedef union{
  unsigned char bytes[8];
  unsigned short int words[4];
  unsigned int dwords[2];
  unsigned long long int qword;
} test;
int main(){
  printf("%d %d %d %d %d\n", sizeof(char), sizeof(short), sizeof(int), sizeof(long), sizeof(long long));
  test t;
  t.qword=0x0001020304050607u;
  printf("%02hhX|%02hhX|%02hhX|%02hhX|%02hhX|%02hhX|%02hhX|%02hhX\n",t.bytes[0],t.bytes[1] ,t.bytes[2],t.bytes[3],t.bytes[4],t.bytes[5],t.bytes[6],t.bytes[7]);
  printf("%04hX|%04hX|%04hX|%04hX\n" ,t.words[0] ,t.words[1] ,t.words[2] ,t.words[3]);
  printf("%08lX|%08lX\n" ,t.dwords[0] ,t.dwords[1]);
  printf("%016qX\n" ,t.qword);
  return 0;
}

不,就endianness而言,这很好,但如果你的int只有16位宽,你可能会遇到问题。

The problem as you've specified, where you are using an existing byte array, will work fine across all machines. You will end up with the same answer.

However, depending on how you are creating that stream, it may be affected by endianness and you may not end up with the number you think you will.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM