[英]What is the correct way to convert 2 bytes to a signed 16-bit integer?
In this answer , zwol made this claim:在这个答案中, zwol提出了这个主张:
The correct way to convert two bytes of data from an external source into a 16-bit signed integer is with helper functions like this:将来自外部源的两个字节数据转换为 16 位有符号整数的正确方法是使用如下辅助函数:
#include <stdint.h>
int16_t be16_to_cpu_signed(const uint8_t data[static 2]) {
uint32_t val = (((uint32_t)data[0]) << 8) |
(((uint32_t)data[1]) << 0);
return ((int32_t) val) - 0x10000u;
}
int16_t le16_to_cpu_signed(const uint8_t data[static 2]) {
uint32_t val = (((uint32_t)data[0]) << 0) |
(((uint32_t)data[1]) << 8);
return ((int32_t) val) - 0x10000u;
}
Which of the above functions is appropriate depends on whether the array contains a little endian or a big endian representation.上述哪个函数合适取决于数组是包含小端还是大端表示。 Endianness is not the issue at question here, I am wondering why zwol subtracts 0x10000u
from the uint32_t
value converted to int32_t
.字节序不是这里的问题,我想知道为什么0x10000u
从转换为int32_t
的uint32_t
值中减去0x10000u
。
Why is this the correct way ?为什么这是正确的方法?
How does it avoid the implementation defined behavior when converting to the return type?转换为返回类型时如何避免实现定义的行为?
Since you can assume 2's complement representation, how would this simpler cast fail: return (uint16_t)val;
既然您可以假设 2 的补码表示,那么这个更简单的转换将如何失败: return (uint16_t)val;
What is wrong with this naive solution:这个幼稚的解决方案有什么问题:
int16_t le16_to_cpu_signed(const uint8_t data[static 2]) {
return (uint16_t)data[0] | ((uint16_t)data[1] << 8);
}
If int
is 16-bit then your version relies on implementation-defined behaviour if the value of the expression in the return
statement is out of range for int16_t
.如果int
是 16 位,那么如果return
语句中的表达式值超出int16_t
的范围,则您的版本依赖于实现定义的行为。
However the first version also has a similar problem;但是第一个版本也有类似的问题; for example if int32_t
is a typedef for int
, and the input bytes are both 0xFF
, then the result of the subtraction in the return statement is UINT_MAX
which causes implementation-defined behaviour when converted to int16_t
.例如,如果int32_t
是int
的 typedef,并且输入字节都是0xFF
,则 return 语句中的减法结果是UINT_MAX
,当转换为int16_t
时会导致实现定义的行为。
IMHO the answer you link to has several major issues .恕我直言,您链接的答案有几个主要问题。
This should be pedantically correct and work also on platforms that use sign bit or 1's complement representations, instead of the usual 2's complement .这应该是迂腐正确的,并且也适用于使用符号位或1 的补码表示的平台,而不是通常的2 的补码。 The input bytes are assumed to be in 2's complement.假设输入字节为 2 的补码。
int le16_to_cpu_signed(const uint8_t data[static 2]) {
unsigned value = data[0] | ((unsigned)data[1] << 8);
if (value & 0x8000)
return -(int)(~value) - 1;
else
return value;
}
Because of the branch, it will be more expensive than other options.由于分支的原因,它会比其他选项更贵。
What this accomplishes is that it avoids any assumption on how int
representation relates to unsigned
representation on the platform.这样做的目的是避免任何关于int
表示如何与平台上的unsigned
表示相关的假设。 The cast to int
is required to preserve arithmetic value for any number that will fit in target type.需要转换为int
以保留适合目标类型的任何数字的算术值。 Because the inversion ensures top bit of 16-bit number will be zero, the value will fit.由于反转确保 16 位数字的最高位为零,因此该值将适合。 Then the unary -
and subtraction of 1 apply the usual rule for 2's complement negation.然后一元-
和 1 的减法应用 2 的补码否定的通常规则。 Depending on platform, INT16_MIN
could still overflow if it doesn't fit in the int
type on the target, in which case long
should be used.根据平台的不同,如果INT16_MIN
不适合目标上的int
类型,它仍可能溢出,在这种情况下应使用long
。
The difference to the original version in the question comes at the return time.问题中与原始版本的区别在于返回时间。 While the original just always subtracted 0x10000
and 2's complement let signed overflow wrap it to int16_t
range, this version has the explicit if
that avoids signed wrapover (which is undefined ).虽然原始总是减去0x10000
和 2 的补码让有符号溢出将其包装到int16_t
范围,但此版本具有明确的if
避免有符号包装( 未定义)。
Now in practice, almost all platforms in use today use 2's complement representation.现在在实践中,当今使用的几乎所有平台都使用 2 的补码表示。 In fact, if the platform has standard-compliant stdint.h
that defines int32_t
, it must use 2's complement for it.事实上,如果平台具有定义int32_t
的符合标准的stdint.h
,则它必须使用 2 的补码。 Where this approach sometimes comes handy is with some scripting languages that don't have integer data types at all - you can modify the operations shown above for floats and it will give the correct result.这种方法有时派上用场的是一些根本没有整数数据类型的脚本语言 - 您可以修改上面显示的浮点数操作,它会给出正确的结果。
The arithmetic operators shift and bitwise-or in expression (uint16_t)data[0] | ((uint16_t)data[1] << 8)
表达式(uint16_t)data[0] | ((uint16_t)data[1] << 8)
的算术运算符移位和按位或(uint16_t)data[0] | ((uint16_t)data[1] << 8)
don't work on types smaller than int
, so that those uint16_t
values get promoted to int
(or unsigned
if sizeof(uint16_t) == sizeof(int)
). (uint16_t)data[0] | ((uint16_t)data[1] << 8)
不适用于小于int
类型,因此这些uint16_t
值被提升为int
(或unsigned
if sizeof(uint16_t) == sizeof(int)
)。 Still though, that should yield the correct answer, since only the lower 2 bytes contain the value.尽管如此,这应该会产生正确的答案,因为只有较低的 2 个字节包含该值。
Another pedantically correct version for big-endian to little-endian conversion (assuming little-endian CPU) is: big-endian 到 little-endian 转换的另一个迂腐正确的版本(假设 little-endian CPU)是:
#include <string.h>
#include <stdint.h>
int16_t be16_to_cpu_signed(const uint8_t data[2]) {
int16_t r;
memcpy(&r, data, sizeof r);
return __builtin_bswap16(r);
}
memcpy
is used to copy the representation of int16_t
and that is the standard-compliant way to do so. memcpy
用于复制int16_t
的表示,这是符合标准的方法。 This version also compiles into 1 instruction movbe
, see assembly .这个版本也编译成 1 条指令movbe
,见汇编。
Another method - using union
:另一种方法 - 使用union
:
union B2I16
{
int16_t i;
byte b[2];
};
In program:在节目中:
...
B2I16 conv;
conv.b[0] = first_byte;
conv.b[1] = second_byte;
int16_t result = conv.i;
first_byte
and second_byte
can be swapped according to little or big endian model. first_byte
和second_byte
可以根据小端或大端模型交换。 This method is not better but is one of alternatives.这种方法不是更好,而是替代方法之一。
Here is another version that relies only on portable and well-defined behaviours (header #include <endian.h>
is not standard, the code is):这是另一个仅依赖于可移植和明确定义的行为的版本(标头#include <endian.h>
不是标准的,代码是):
#include <endian.h>
#include <stdint.h>
#include <string.h>
static inline void swap(uint8_t* a, uint8_t* b) {
uint8_t t = *a;
*a = *b;
*b = t;
}
static inline void reverse(uint8_t* data, int data_len) {
for(int i = 0, j = data_len / 2; i < j; ++i)
swap(data + i, data + data_len - 1 - i);
}
int16_t be16_to_cpu_signed(const uint8_t data[2]) {
int16_t r;
#if __BYTE_ORDER == __LITTLE_ENDIAN
uint8_t data2[sizeof r];
memcpy(data2, data, sizeof data2);
reverse(data2, sizeof data2);
memcpy(&r, data2, sizeof r);
#else
memcpy(&r, data, sizeof r);
#endif
return r;
}
The little-endian version compiles to single movbe
instruction with clang
, gcc
version is less optimal, see assembly . little-endian 版本使用clang
编译为单个movbe
指令, gcc
版本不太理想,请参阅assembly 。
I want to thank all contributors for theirs answers.我要感谢所有贡献者的回答。 Here is what the collective works boils down to:以下是集体作品的内容:
uint8_t
, int16_t
and uint16_t
must use two's complement representation without any padding bits, so the actual bits of the representation are unambiguously those of the 2 bytes in the array, in the order specified by the function names.根据 C 标准7.20.1.1 精确宽度整数类型:类型uint8_t
、 int16_t
和uint16_t
必须使用没有任何填充位的二进制补码表示,因此表示的实际位是数组中 2 个字节的明确位,在由函数名指定的顺序。(unsigned)data[0] | ((unsigned)data[1] << 8)
用(unsigned)data[0] | ((unsigned)data[1] << 8)
计算无符号的 16 位值(unsigned)data[0] | ((unsigned)data[1] << 8)
(unsigned)data[0] | ((unsigned)data[1] << 8)
(for the little endian version) compiles to a single instruction and yields an unsigned 16-bit value. (unsigned)data[0] | ((unsigned)data[1] << 8)
(对于小端版本)编译为一条指令并产生一个无符号的 16 位值。uint16_t
to signed type int16_t
has implementation defined behavior if the value is not in the range of the destination type.根据 C 标准6.3.1.3 有符号和无符号整数:如果值不在目标类型的范围内,则将uint16_t
类型的值转换为有符号类型int16_t
具有实现定义的行为。 No special provision is made for types whose representation is precisely defined.对于精确定义表示的类型没有特别规定。INT_MAX
and compute the corresponding signed value by subtracting 0x10000
.为了避免这种实现定义的行为,可以测试无符号值是否大于INT_MAX
并通过减去0x10000
来计算相应的有符号值。 Doing this for all values as suggested by zwol may produce values outside the range of int16_t
with the same implementation defined behavior.按照zwol 的建议对所有值执行此操作可能会产生具有相同实现定义行为的int16_t
范围之外的值。0x8000
bit explicitly causes the compilers to produce inefficient code.对0x8000
位的测试显式导致编译器生成低效代码。memcpy
.可以使用memcpy
可移植地执行类型双关并定义行为。Combining points 2 and 7, here is a portable and fully defined solution that compiles efficiently to a single instruction with both gcc and clang :结合第 2 点和第 7 点,这是一个可移植且完全定义的解决方案,它可以使用gcc和clang有效地编译为单个指令:
#include <stdint.h>
#include <string.h>
int16_t be16_to_cpu_signed(const uint8_t data[2]) {
int16_t r;
uint16_t u = (unsigned)data[1] | ((unsigned)data[0] << 8);
memcpy(&r, &u, sizeof r);
return r;
}
int16_t le16_to_cpu_signed(const uint8_t data[2]) {
int16_t r;
uint16_t u = (unsigned)data[0] | ((unsigned)data[1] << 8);
memcpy(&r, &u, sizeof r);
return r;
}
be16_to_cpu_signed(unsigned char const*):
movbe ax, WORD PTR [rdi]
ret
le16_to_cpu_signed(unsigned char const*):
movzx eax, WORD PTR [rdi]
ret
Why not just use your "naive solution," but cast each element to int16_t
instead of uint16_t
?为什么不直接使用您的“天真的解决方案”,而是将每个元素转换为int16_t
而不是uint16_t
?
int16_t le16_to_cpu_signed(const uint8_t data[static 2]) {
return (int16_t)data[0] | ((int16_t)data[1] << 8);
}
Then you would not have to deal with casting unsigned ints to signed ints (and possibly being out of the signed int range).那么你就不必处理将无符号整数转换为有符号整数(并且可能超出有符号整数范围)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.