简体   繁体   English

将字节数组(char数组)转换为整数类型(short,int,long)

[英]Converting byte array (char array) to an integer type (short, int, long)

I was wondering if system endianness matters when converting a byte array to a short / int / long. 我想知道在将字节数组转换为short / int / long时系统字节顺序是否重要。 Would this be incorrect to do if the code runs on both big-endian and little-endian machines? 如果代码在big-endian和little-endian机器上运行,这会不正确吗?

short s = (b[0] << 8) | (b[1]);
int i = (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | (b[3])

Yes, endianness matters. 是的,字节序很重要。 In little endian you have the most significant byte in the upper part of the short or int - ie bits 8-15 for short and 24-31 for int. 在little endian中,你在short或int的上半部分中有最重要的字节 - 即8-15位为short,24-31为int。 For big endian the byte order would need to be reversed: 对于大端,字节顺序需要反转:

short s = ((b[1] << 8) | b[0]);
int i = (b[3] << 24) | (b[2] << 16) | (b[1] << 8) | (b[0]);

Note that this assumes that the byte array is in little endian order. 请注意,这假设字节数组是小端序。 Endianness and conversion between byte array and integer types depends not only on the endianness of the CPU but also on the endianness of the byte array data. 字节数组和整数类型之间的字节顺序和转换不仅取决于CPU的字节顺序,还取决于字节数组数据的字节顺序。

It is recommended to wrap these conversions in functions that will know (either via compilation flags or at run time) the endianness of the system and perform the conversion correctly. 建议将这些转换包装在能够知道(通过编译标志或在运行时)系统的字节顺序并正确执行转换的函数中。

In addition, creating a standard for the byte array data (always big endian, for example) and then using the socket ntoh_s and ntoh_l will offload the decision regarding endianness to the OS socket implementation that is aware of such things. 此外,为字节数组数据创建一个标准(例如,总是大端),然后使用socket ntoh_sntoh_l将关于字节序的决定卸载到知道这些事情的OS socket实现。 Note that the default network order is big endian (the n in ntoh_x ), so having the byte array data as big endian would be the most straight forward way to do this. 请注意,默认网络顺序是大端( ntoh_xn ),因此将字节数组数据作为大端将是最直接的方法。

As pointed out by the OP (@Mike), boost also provides endianness conversion functions. 正如OP(@Mike)所指出的, boost还提供了字节序转换功能。

// on little endian:

unsigned char c[] = { 1, 0 };       // "one" in little endian order { LSB, MSB }

int a = (c[1] << 8) | c[0];         // a = 1

//---------------------------------------------------------------------------- // ------------------------------------------------ ----------------------------

// on big endian:

unsigned char c[] = { 0, 1 };       // "one" in big endian order { MSB, LSB }

int a = (c[0] << 8) | c[1];         // a = 1

//---------------------------------------------------------------------------- // ------------------------------------------------ ----------------------------

// on little endian:

unsigned char c[] = { 0, 1 };       // "one" in big endian order { MSB, LSB }

int a = (c[0] << 8) | c[1];         // a = 1 (reverse byte order)

//---------------------------------------------------------------------------- // ------------------------------------------------ ----------------------------

// on big endian:

unsigned char c[] = { 1, 0 };       // "one" in little endian order { LSB, MSB }

int a = (c[1] << 8) | c[0];         // a = 1 (reverse byte order)

You can use unions for this. 你可以使用工会。 Endianness matters, to change it you can use x86 BSWAP instruction (or analogues for another platforms), provided by the most of c compilers as an intrinsic. 字节序很重要,要改变它,你可以使用x86 BSWAP指令(或其他平台的类似物),由大多数c编译器提供作为内在函数。

#include <stdio.h>
typedef union{
  unsigned char bytes[8];
  unsigned short int words[4];
  unsigned int dwords[2];
  unsigned long long int qword;
} test;
int main(){
  printf("%d %d %d %d %d\n", sizeof(char), sizeof(short), sizeof(int), sizeof(long), sizeof(long long));
  test t;
  t.qword=0x0001020304050607u;
  printf("%02hhX|%02hhX|%02hhX|%02hhX|%02hhX|%02hhX|%02hhX|%02hhX\n",t.bytes[0],t.bytes[1] ,t.bytes[2],t.bytes[3],t.bytes[4],t.bytes[5],t.bytes[6],t.bytes[7]);
  printf("%04hX|%04hX|%04hX|%04hX\n" ,t.words[0] ,t.words[1] ,t.words[2] ,t.words[3]);
  printf("%08lX|%08lX\n" ,t.dwords[0] ,t.dwords[1]);
  printf("%016qX\n" ,t.qword);
  return 0;
}

不,就endianness而言,这很好,但如果你的int只有16位宽,你可能会遇到问题。

The problem as you've specified, where you are using an existing byte array, will work fine across all machines. 您指定的问题,即使用现有字节数组的地方,可以在所有计算机上正常运行。 You will end up with the same answer. 你最终会得到同样的答案。

However, depending on how you are creating that stream, it may be affected by endianness and you may not end up with the number you think you will. 但是,根据您创建该流的方式,它可能会受到字节序的影响,您可能无法得到您认为的数字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM