简体   繁体   English

将 2 个字节转换为有符号的 16 位整数的正确方法是什么?

[英]What is the correct way to convert 2 bytes to a signed 16-bit integer?

In this answer , zwol made this claim:这个答案中zwol提出了这个主张:

The correct way to convert two bytes of data from an external source into a 16-bit signed integer is with helper functions like this:将来自外部源的两个字节数据转换为 16 位有符号整数的正确方法是使用如下辅助函数:

#include <stdint.h>

int16_t be16_to_cpu_signed(const uint8_t data[static 2]) {
    uint32_t val = (((uint32_t)data[0]) << 8) | 
                   (((uint32_t)data[1]) << 0);
    return ((int32_t) val) - 0x10000u;
}

int16_t le16_to_cpu_signed(const uint8_t data[static 2]) {
    uint32_t val = (((uint32_t)data[0]) << 0) | 
                   (((uint32_t)data[1]) << 8);
    return ((int32_t) val) - 0x10000u;
}

Which of the above functions is appropriate depends on whether the array contains a little endian or a big endian representation.上述哪个函数合适取决于数组是包含小端还是大端表示。 Endianness is not the issue at question here, I am wondering why zwol subtracts 0x10000u from the uint32_t value converted to int32_t .字节不是这里的问题,我想知道为什么0x10000u从转换为int32_tuint32_t值中减去0x10000u

Why is this the correct way ?为什么这是正确的方法

How does it avoid the implementation defined behavior when converting to the return type?转换为返回类型时如何避免实现定义的行为?

Since you can assume 2's complement representation, how would this simpler cast fail: return (uint16_t)val;既然您可以假设 2 的补码表示,那么这个更简单的转换将如何失败: return (uint16_t)val;

What is wrong with this naive solution:这个幼稚的解决方案有什么问题:

int16_t le16_to_cpu_signed(const uint8_t data[static 2]) {
    return (uint16_t)data[0] | ((uint16_t)data[1] << 8);
}

If int is 16-bit then your version relies on implementation-defined behaviour if the value of the expression in the return statement is out of range for int16_t .如果int是 16 位,那么如果return语句中的表达式值超出int16_t的范围,则您的版本依赖于实现定义的行为。

However the first version also has a similar problem;但是第一个版本也有类似的问题; for example if int32_t is a typedef for int , and the input bytes are both 0xFF , then the result of the subtraction in the return statement is UINT_MAX which causes implementation-defined behaviour when converted to int16_t .例如,如果int32_tint的 typedef,并且输入字节都是0xFF ,则 return 语句中的减法结果是UINT_MAX ,当转换为int16_t时会导致实现定义的行为。

IMHO the answer you link to has several major issues .恕我直言,您链接的答案有几个主要问题。

This should be pedantically correct and work also on platforms that use sign bit or 1's complement representations, instead of the usual 2's complement .这应该是迂腐正确的,并且也适用于使用符号位1 的补码表示的平台,而不是通常的2 的补码 The input bytes are assumed to be in 2's complement.假设输入字节为 2 的补码。

int le16_to_cpu_signed(const uint8_t data[static 2]) {
    unsigned value = data[0] | ((unsigned)data[1] << 8);
    if (value & 0x8000)
        return -(int)(~value) - 1;
    else
        return value;
}

Because of the branch, it will be more expensive than other options.由于分支的原因,它会比其他选项更贵。

What this accomplishes is that it avoids any assumption on how int representation relates to unsigned representation on the platform.这样做的目的是避免任何关于int表示如何与平台上的unsigned表示相关的假设。 The cast to int is required to preserve arithmetic value for any number that will fit in target type.需要转换为int以保留适合目标类型的任何数字的算术值。 Because the inversion ensures top bit of 16-bit number will be zero, the value will fit.由于反转确保 16 位数字的最高位为零,因此该值将适合。 Then the unary - and subtraction of 1 apply the usual rule for 2's complement negation.然后一元-和 1 的减法应用 2 的补码否定的通常规则。 Depending on platform, INT16_MIN could still overflow if it doesn't fit in the int type on the target, in which case long should be used.根据平台的不同,如果INT16_MIN不适合目标上的int类型,它仍可能溢出,在这种情况下应使用long

The difference to the original version in the question comes at the return time.问题中与原始版本的区别在于返回时间。 While the original just always subtracted 0x10000 and 2's complement let signed overflow wrap it to int16_t range, this version has the explicit if that avoids signed wrapover (which is undefined ).虽然原始总是减去0x10000和 2 的补码让有符号溢出将其包装到int16_t范围,但此版本具有明确的if避免有符号包装( 未定义)。

Now in practice, almost all platforms in use today use 2's complement representation.现在在实践中,当今使用的几乎所有平台都使用 2 的补码表示。 In fact, if the platform has standard-compliant stdint.h that defines int32_t , it must use 2's complement for it.事实上,如果平台具有定义int32_t的符合标准的stdint.h ,则它必须使用 2 的补码。 Where this approach sometimes comes handy is with some scripting languages that don't have integer data types at all - you can modify the operations shown above for floats and it will give the correct result.这种方法有时派上用场的是一些根本没有整数数据类型的脚本语言 - 您可以修改上面显示的浮点数操作,它会给出正确的结果。

The arithmetic operators shift and bitwise-or in expression (uint16_t)data[0] | ((uint16_t)data[1] << 8)表达式(uint16_t)data[0] | ((uint16_t)data[1] << 8)的算术运算符移位按位或(uint16_t)data[0] | ((uint16_t)data[1] << 8) don't work on types smaller than int , so that those uint16_t values get promoted to int (or unsigned if sizeof(uint16_t) == sizeof(int) ). (uint16_t)data[0] | ((uint16_t)data[1] << 8)不适用于小于int类型,因此这些uint16_t值被提升为int (或unsigned if sizeof(uint16_t) == sizeof(int) )。 Still though, that should yield the correct answer, since only the lower 2 bytes contain the value.尽管如此,这应该会产生正确的答案,因为只有较低的 2 个字节包含该值。

Another pedantically correct version for big-endian to little-endian conversion (assuming little-endian CPU) is: big-endian 到 little-endian 转换的另一个迂腐正确的版本(假设 little-endian CPU)是:

#include <string.h>
#include <stdint.h>

int16_t be16_to_cpu_signed(const uint8_t data[2]) {
    int16_t r;
    memcpy(&r, data, sizeof r);
    return __builtin_bswap16(r);
}

memcpy is used to copy the representation of int16_t and that is the standard-compliant way to do so. memcpy用于复制int16_t的表示,这是符合标准的方法。 This version also compiles into 1 instruction movbe , see assembly .这个版本也编译成 1 条指令movbe ,见汇编

Another method - using union :另一种方法 - 使用union

union B2I16
{
   int16_t i;
   byte    b[2];
};

In program:在节目中:

...
B2I16 conv;

conv.b[0] = first_byte;
conv.b[1] = second_byte;
int16_t result = conv.i;

first_byte and second_byte can be swapped according to little or big endian model. first_bytesecond_byte可以根据小端或大端模型交换。 This method is not better but is one of alternatives.这种方法不是更好,而是替代方法之一。

Here is another version that relies only on portable and well-defined behaviours (header #include <endian.h> is not standard, the code is):这是另一个仅依赖于可移植和明确定义的行为的版本(标头#include <endian.h>不是标准的,代码是):

#include <endian.h>
#include <stdint.h>
#include <string.h>

static inline void swap(uint8_t* a, uint8_t* b) {
    uint8_t t = *a;
    *a = *b;
    *b = t;
}
static inline void reverse(uint8_t* data, int data_len) {
    for(int i = 0, j = data_len / 2; i < j; ++i)
        swap(data + i, data + data_len - 1 - i);
}

int16_t be16_to_cpu_signed(const uint8_t data[2]) {
    int16_t r;
#if __BYTE_ORDER == __LITTLE_ENDIAN
    uint8_t data2[sizeof r];
    memcpy(data2, data, sizeof data2);
    reverse(data2, sizeof data2);
    memcpy(&r, data2, sizeof r);
#else
    memcpy(&r, data, sizeof r);
#endif
    return r;
}

The little-endian version compiles to single movbe instruction with clang , gcc version is less optimal, see assembly . little-endian 版本使用clang编译为单个movbe指令, gcc版本不太理想,请参阅assembly

I want to thank all contributors for theirs answers.我要感谢所有贡献者的回答。 Here is what the collective works boils down to:以下是集体作品的内容:

  1. As per the C Standard 7.20.1.1 Exact-width integer types : types uint8_t , int16_t and uint16_t must use two's complement representation without any padding bits, so the actual bits of the representation are unambiguously those of the 2 bytes in the array, in the order specified by the function names.根据 C 标准7.20.1.1 精确宽度整数类型:类型uint8_tint16_tuint16_t必须使用没有任何填充位的二进制补码表示,因此表示的实际位是数组中 2 个字节的明确位,在由函数名指定的顺序。
  2. computing the unsigned 16 bit value with (unsigned)data[0] | ((unsigned)data[1] << 8)(unsigned)data[0] | ((unsigned)data[1] << 8)计算无符号的 16 位值(unsigned)data[0] | ((unsigned)data[1] << 8) (unsigned)data[0] | ((unsigned)data[1] << 8) (for the little endian version) compiles to a single instruction and yields an unsigned 16-bit value. (unsigned)data[0] | ((unsigned)data[1] << 8) (对于小端版本)编译为一条指令并产生一个无符号的 16 位值。
  3. As per the C Standard 6.3.1.3 Signed and unsigned integers : converting a value of type uint16_t to signed type int16_t has implementation defined behavior if the value is not in the range of the destination type.根据 C 标准6.3.1.3 有符号和无符号整数:如果值不在目标类型的范围内,则将uint16_t类型的值转换为有符号类型int16_t具有实现定义的行为。 No special provision is made for types whose representation is precisely defined.对于精确定义表示的类型没有特别规定。
  4. to avoid this implementation defined behavior, one can test if the unsigned value is larger than INT_MAX and compute the corresponding signed value by subtracting 0x10000 .为了避免这种实现定义的行为,可以测试无符号值是否大于INT_MAX并通过减去0x10000来计算相应的有符号值。 Doing this for all values as suggested by zwol may produce values outside the range of int16_t with the same implementation defined behavior.按照zwol 的建议对所有值执行此操作可能会产生具有相同实现定义行为的int16_t范围之外的值。
  5. testing for the 0x8000 bit explicitly causes the compilers to produce inefficient code.0x8000位的测试显式导致编译器生成低效代码。
  6. a more efficient conversion without implementation defined behavior uses type punning via a union, but the debate regarding the definedness of this approach is still open, even at the C Standard's Committee level.没有实现定义行为的更有效的转换通过联合使用类型双关语,但关于这种方法的定义性的争论仍然存在,即使在 C 标准委员会级别也是​​如此。
  7. type punning can be performed portably and with defined behavior using memcpy .可以使用memcpy可移植地执行类型双关并定义行为。

Combining points 2 and 7, here is a portable and fully defined solution that compiles efficiently to a single instruction with both gcc and clang :结合第 2 点和第 7 点,这是一个可移植且完全定义的解决方案,它可以使用gccclang有效地编译为单个指令:

#include <stdint.h>
#include <string.h>

int16_t be16_to_cpu_signed(const uint8_t data[2]) {
    int16_t r;
    uint16_t u = (unsigned)data[1] | ((unsigned)data[0] << 8);
    memcpy(&r, &u, sizeof r);
    return r;
}

int16_t le16_to_cpu_signed(const uint8_t data[2]) {
    int16_t r;
    uint16_t u = (unsigned)data[0] | ((unsigned)data[1] << 8);
    memcpy(&r, &u, sizeof r);
    return r;
}

64-bit Assembly : 64 位程序集

be16_to_cpu_signed(unsigned char const*):
        movbe   ax, WORD PTR [rdi]
        ret
le16_to_cpu_signed(unsigned char const*):
        movzx   eax, WORD PTR [rdi]
        ret

Why not just use your "naive solution," but cast each element to int16_t instead of uint16_t ?为什么不直接使用您的“天真的解决方案”,而是将每个元素转换为int16_t而不是uint16_t

int16_t le16_to_cpu_signed(const uint8_t data[static 2]) {
    return (int16_t)data[0] | ((int16_t)data[1] << 8);
}

Then you would not have to deal with casting unsigned ints to signed ints (and possibly being out of the signed int range).那么你就不必处理将无符号整数转换为有符号整数(并且可能超出有符号整数范围)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM