简体   繁体   English

将 uint8_t* 转换为 uint64_t

[英]convert uint8_t* to uint64_t

What's more recommended or advisable way to convert array of uint8_t at offset i to uint64_t and why?将偏移量i处的uint8_t数组转换为uint64_t的更推荐或可取的方法是什么?为什么?

uint8_t * bytes = ...
uint64_t const v = ((uint64_t *)(bytes + i))[0];

or或者

uint64_t const v = ((uint64_t)(bytes[i+7]) << 56)
                 | ((uint64_t)(bytes[i+6]) << 48)
                 | ((uint64_t)(bytes[i+5]) << 40)
                 | ((uint64_t)(bytes[i+4]) << 32)
                 | ((uint64_t)(bytes[i+3]) << 24)
                 | ((uint64_t)(bytes[i+2]) << 16)
                 | ((uint64_t)(bytes[i+1]) << 8)
                 | ((uint64_t)(bytes[i]));

There are two primary differences.有两个主要区别。

One, the behavior of ((uint64_t *)(bytes + i))[0] is not defined by the C standard (unless certain prerequisites about what bytes point to are met).一, ((uint64_t *)(bytes + i))[0]的行为不是由 C 标准定义的(除非满足有关bytes指向的某些先决条件)。 Generally, an array of bytes should not be accessed using a uint64_t type.通常,不应使用uint64_t类型访问字节数组。

When memory defined as one type is accessed with another type, it is called aliasing, and the C standard only defines certain combinations of aliasing.当定义为一种类型的 memory 被另一种类型访问时,称为别名,C 标准仅定义了某些别名组合。 Some compilers may support some aliasing beyond what the standard requires, but using it is not portable.一些编译器可能支持超出标准要求的一些别名,但使用它是不可移植的。 Additionally, if bytes + i is not suitably aligned for a uint64_t , the access may cause an exception or otherwise malfunction.此外,如果bytes + i未针对uint64_t适当对齐,则访问可能会导致异常或其他故障。

Two, loading the bytes through aliasing, if it is defined (by the standard or by compiler extension), interprets the bytes using the memory ordering for the C implementation.二,通过别名加载字节,如果它被定义(通过标准或编译器扩展),使用 memory 排序解释字节 C 实现。 Some C implementations store the bytes representing integers in memory from low address to high address for low-position-value bytes to high-position-value bytes, and some store them from high address to low address.一些 C 实现将代表整数的字节存储在 memory 中,从低地址到高地址,用于低位置值字节到高位置值字节,还有一些将它们从高地址存储到低地址。 (And they can be stored in non-consecutive orders too, although this is rare.) So loading the bytes this way will produce different values from the same bytes in memory based on what order the C implementation uses. (并且它们也可以以非连续的顺序存储,尽管这种情况很少见。)因此,根据 C 实现使用的顺序,以这种方式加载字节会从 memory 中的相同字节产生不同的值。

But loading the bytes and using shifts to combine them will always produce the same value from the same bytes in memory regardless of what order the C implementation uses.但是无论 C 实现使用什么顺序,加载字节并使用移位来组合它们总是会从 memory 中的相同字节产生相同的值。

The first method should be avoided, because there is no need for it.第一种方法应该避免,因为没有必要。 If one desires to interpret the bytes using the C implementation's ordering, this can be done with:如果希望使用 C 实现的排序来解释字节,可以通过以下方式完成:

uint64_t t;
memcpy(&t, bytes+i, sizeof t);
const uint64_t v = t;

Using memcpy provides a portable way of aliasing the uint64_t to store bytes into it.使用memcpy提供了一种可移植的方式来别名uint64_t以将字节存储到其中。 Good compilers recognize this idiom and will optimize the memcpy to a load from memory, if suitable for the target architecture (and if optimization is enabled).好的编译器会识别这个习惯用法,如果适合目标架构(并且如果启用了优化),则会将memcpy优化为从 memory 加载。

If one desires to interpret the bytes using little-endian ordering, as shown in the code in the question, then the second method may be used.如果希望使用 little-endian 排序来解释字节,如问题中的代码所示,则可以使用第二种方法。 (Sometimes platforms will have routines that may provide more efficient code for this.) (有时平台会有例程可以为此提供更有效的代码。)

You can also use memcpy你也可以使用 memcpy

uint64_t n;
memcpy(&n, bytes + i, sizeof(uint64_t));
const uint64_t v = n;

The first option has two big problems that qualify as undefined behavior (anything can happen):第一个选项有两个大问题,可以称为未定义的行为(任何事情都可能发生):

  • A uint8_t* or array of uint8_t is not necessarily aligned the same way as required by a larger type like uint64_t . uint8_t*uint8_t数组的对齐方式不一定与uint64_t等较大类型所需的方式相同。 Simply casting to uint64_t* leads to misaligned access.简单地转换为uint64_t*会导致访问不对齐。 This can cause hardware exceptions, program crashes, slower code etc, all depending on the alignment requirements of the specific target.这可能会导致硬件异常、程序崩溃、代码变慢等,这一切都取决于特定目标的 alignment 要求。

  • It violates the internal type system of C, where each object in memory known by the compiler has an "effective type" that the compiler keeps track of.它违反了 C 的内部类型系统,其中编译器已知的 memory 中的每个 object 都有一个编译器跟踪的“有效类型”。 Based on this, the compiler is allowed to make certain assumptions regarding if a certain memory region have been accessed or not during optimization.基于此,允许编译器对优化期间是否已访问某个 memory 区域做出某些假设。 If your code violates these type rules, as it would in this case, wrong machine code could get generated.如果您的代码违反了这些类型规则,就像在这种情况下一样,可能会生成错误的机器代码。

    This is most commonly referred to as the strict aliasing rule and your cast followed by dereferencing would be a so-called "strict aliasing violation".这通常被称为严格别名规则,并且您的演员表随后取消引用将是所谓的“严格别名违规”。

The second option is sound code, because:第二个选项是声音代码,因为:

  • When doing shifts or other forms of bitwise arithmetic, a large integer type should be used.在进行移位或其他 forms 的按位运算时,应使用大的 integer 类型。 That is, unsigned int or larger - depending on system.也就是说, unsigned int或更大 - 取决于系统。 Using signed types or small integer types can lead to undefined behavior or unexpected results.使用有符号类型或小型 integer 类型可能会导致未定义的行为或意外结果。 See Implicit type promotion rules regarding problems with small integer types implicitly changing signedness in some expressions.有关小型 integer 类型隐式更改某些表达式中的符号的问题,请参阅隐式类型提升规则

    If not for the cast to uint64_t , then the bytes[i+7] << 56 shift would involve an implicit promotion of the left operand from uint8_t to int , which would be a bug.如果不是强制转换为uint64_t ,那么bytes[i+7] << 56移位将涉及将左操作数从uint8_t隐式提升为int ,这将是一个错误。 Because if the most significant bit (MSB) of the byte is set and we shift into/beyond the sign bit, we invoke undefined behavior - again, anything can happen.因为如果设置了字节的最高有效位 (MSB) 并且我们移入/移出符号位,我们会调用未定义的行为 - 同样,任何事情都可能发生。

    And naturally we need to use a 64 bit type in this specific case or otherwise we wouldn't be able to shift as far as 56 bits.当然,在这种特定情况下我们需要使用 64 位类型,否则我们将无法移动至 56 位。 Shifting beyond the range of the type of the left operand is also undefined behavior.超出左操作数类型的范围也是未定义的行为。

Note that whether to pick the order of bytes[i+7] << 56 versus the alternative bytes[i+0] << 56 depends on the underlying CPU endianess .请注意,是否选择bytes[i+7] << 56与替代bytes[i+0] << 56的顺序取决于底层CPU 字节序 Bit shifts are nice since the actual shift ignores if the destination type is using big or little endian.位移位很好,因为如果目标类型使用大端或小端,实际位移会被忽略。 But in this case you must know in advance which byte in the source array you want to correspond to the most significant.但在这种情况下,您必须事先知道要对应源数组中的哪个字节最重要。 This code you have here will work if the array was built based on little endian formatting, since the last byte of the array is shifted to the highest address.如果数组是基于小端格式构建的,那么您在此处的代码将起作用,因为数组的最后一个字节被移动到最高地址。

As for the uint64_t const v = , the const qualifier is a bit strange to have at local scope like that.至于uint64_t const v = ,像这样在本地 scope 有const限定符有点奇怪。 It's harmless but confusing and doesn't really add anything of value inside a local scope.它无害但令人困惑,并且在本地 scope 中并没有真正增加任何价值。 I would just drop it.我会放弃它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM