在8位平台或32位平台上将uint8_t的元素转换为32位变量

Question

Lets consider two examples 让我们考虑两个例子

1: 8 bit MCU/MPU/Platform - Little endian 1：8位MCU / MPU /平台-小端

uint8_t arr[5] = {0x1,0x2,0x3,0x4,0x5};//assume &arr[0] == 0x0
uint32_t *ui32 = (uint32_t*)&arr[1];

What is the value of *ui32 ? *ui32的值是*ui32 ？ 0x2030405 ? 0x2030405 ？ Is it necessary uint32_t variable to be placed to an address multiple of 4 at this platform? 是否需要在此平台uint32_t变量放置到4的地址倍数上？

1: 32 bit MCU/MPU/Platform - Little endian 1：32位MCU / MPU /平台-小端

Pretty much the same example: 几乎相同的示例：

uint8_t arr[] = {0x1,0x2,0x3,0x4,0x5, 0x6, 0x7, 0x8}; //again assume &arr[0] == 0x0
uint32_t *ui32 = (uint32_t*)&arr[1];

What is the value of *ui32 ? *ui32的值是*ui32 ？

I know that 32bit variables should reside in address multiple of 4. 我知道32位变量应该驻留在4的地址倍数中。

Where I can find specification on this? 在哪里可以找到规范？

Answer 1

Language Lawyering 语言律师

Your code contains undefined behavior and is non-portable. 您的代码包含未定义的行为，并且不可移植。 For example, on some UNIX workstations I've programmed on, memory accesses must be aligned to the size of the operand, so most but not all of the time, attempting to dereference (uint32_t*)&arr[1] would crash the program with SIGBUS , a hardware error caused by the memory bus. 例如，在我编写过的某些UNIX工作站上，内存访问必须与操作数的大小对齐，因此大多数（但不是全部）时间，尝试取消引用(uint32_t*)&arr[1]会使程序崩溃SIGBUS ，由内存总线引起的硬件错误。 The compiler allows you to shoot yourself in the foot like that. 编译器使您可以像这样用脚射击自己。 Casting a pointer like you did violates the strict aliasing rules of C, which causes undefined behavior . 像您一样强制转换指针违反了C的严格别名规则，这会导致未定义的行为 。

You can get around this issue by writing uint32_t x; memcpy( &x, &array[1], sizeof(x) ) 您可以通过编写uint32_t x; memcpy( &x, &array[1], sizeof(x) )来解决此问题uint32_t x; memcpy( &x, &array[1], sizeof(x) ) uint32_t x; memcpy( &x, &array[1], sizeof(x) ) , which the standard explicitly allows. uint32_t x; memcpy( &x, &array[1], sizeof(x) ) ，该标准明确允许。 From this point on, I'll be assuming you're doing the equivalent of this. 从这一点开始，我将假设您正在执行与此等效的操作。 If you were not using an offset into the array, you could also type=pun with fields of a union in C (although the rules are different in C++). 如果未在数组中使用偏移量，则还可以在C中使用联合字段键入= pun（尽管C ++中的规则不同）。

By the standard, the elements of an array must be stored contiguously, with no padding between them. 按照标准，数组的元素必须连续存储，并且它们之间没有填充。 A memcpy() between some object x and an array of unsigned char[sizeof(x)] is legal, and the result is called its object representation . 在某个对象x与一个unsigned char[sizeof(x)]数组之间的memcpy()是合法的，其结果称为其对象表示 。

Copying arbitrary bits to the object representation of any of the exact-width types in <stdint.h> with memcpy() is unspecified behavior , not undefined behavior . 使用memcpy()将任意位复制到<stdint.h>的任何精确宽度类型的对象表示形式是未指定的行为 ，而不是未定义的行为 。 It is a well-formed program, and you will get some valid uint32_t out of it, even though the language standard does not say what that has to be. 这是一个格式正确的程序，即使语言标准没有说明必须写的内容，您也会从其中得到一些有效的uint32_t 。 You aren't giving the compiler permission to do whatever it wants, such as Kill All Humans. 您没有授予编译器执行所需的任何权限的权限，例如Kill All Humans。 This is only because the standard does not permit the exact-width integral types to have any bits other than value bits, and therefore, they cannot have trap representations , invalid bit patterns that cause undefined behavior if copied into a value of that type. 这仅是因为该标准不允许精确宽度整数类型具有除值位以外的任何其他位，因此，它们不能具有陷阱表示形式 ，即无效位模式，如果将其复制到该类型的值中会导致未定义的行为。 (The example in the standard is an implementation that stores a parity bit in every word.) （标准中的示例是在每个字中存储一个奇偶校验位的实现。）

However, the other side of that guarantee is that the types uint8_t and uint32_t are not guaranteed to exist, and there have been a few architectures in the real world for which conforming versions of them could never exist. 但是，这种保证的另一面是，不保证uint8_t和uint32_t类型不存在，并且在现实世界中有一些体系结构永远不会存在它们的符合版本。 (However, unsigned char array[sizeof(uint_least32_t) + 1] is guaranteed to work.) （但是， unsigned char array[sizeof(uint_least32_t) + 1]可以保证正常工作。）

Tl;dr 文艺青年最爱的

A real-world little-endian implementation on which that code runs correctly would probably tell you that *u32 is 0x05040302 . 可以在其上正确运行代码的真实世界小端实现可能会告诉您*u32 0x05040302为0x05040302 。 Otherwise, we would call it something other than little-endian. 否则，我们将其称为little-endian以外的东西。 However, some compilers put the onus on the programmer to follow the strict-aliasing rules carefully. 但是，某些编译器让程序员有责任严格遵循严格的锯齿规则。 They are known to produce optimized code that doesn't do what you expect if you write through either pointer. 众所周知，它们会产生优化的代码，如果您通过任何一个指针编写代码，它们都不会达到您的期望。

Answer 2

1: 8 bit MCU/MPU/Platform - Little endian 1：8位MCU / MPU /平台-小端
 uint8_t arr[5] = {0x1,0x2,0x3,0x4,0x5};//assume &arr[0] == 0x0 uint32_t *ui32 = (uint32_t*)&arr[1]; 
What is the value of *ui32 ? *ui32的值是*ui32 ？

C explicitly declares the effect of reading the value of *ui32 to be undefined in that case, on account of reading the value of an object (part of arr ) via an lvalue of a different type. 由于通过不同类型的左值读取对象（ arr一部分）的值，因此C明确声明在这种情况下读取*ui32的值的效果是不确定的。

0x2030405 ? 0x2030405 ？

It is by no means guaranteed, yet not so uncommon in practice, that the value obtained by reading *ui32 would be that of interpreting the bit pattern comprising elements 1 - 4 of arr as that of a uint32_t , but what number that represents is unspecified. 绝对不能保证，但在实践中并不少见，通过读取*ui32所获得的值将是将包含arr元素1-4的位模式解释为uint32_t的位模式的值，但未指定代表什么数字。 It is left to implementations to determine how to map physical bytes to logical ones. 由实现方式决定如何将物理字节映射到逻辑字节。

However, if by "little-endian" you mean that the C implementation's uint32_t is represented by a four-8-bit-byte sequence in least-significant to most-significant order, and if you suppose that dereferencing the pointer indeed does successfully interpret the pointed-to bit pattern as that of a uint32_t , then the resulting value would be the same as that represented by the integer constant 0x05040302u . 但是，如果使用“ little-endian”，则表示C实现的uint32_t由最低有效位到最高有效位的4-8位字节序列表示，并且如果您认为取消引用指针确实可以成功解释指向uint32_t的指向位模式，则结果值将与整数常量0x05040302u表示的值0x05040302u 。

Is it necessary uint32_t variable to be placed to an address multiple of 4 at this platform? 是否需要在此平台uint32_t变量放置到4的地址倍数上？

You have not specified a platform, nor even a particularly narrow class of platforms. 您还没有指定一个平台，甚至也不是一个特别窄类的平台。 I would generally expect an 8-bit platform not to require 4-byte alignment for objects of type uint32_t , but C does not specify, and platforms and implementations may vary. 通常，我希望8位平台不需要为uint32_t类型的对象要求4字节对齐，但是C没有指定，平台和实现可能会有所不同。

1: 32 bit MCU/MPU/Platform - Little endian 1：32位MCU / MPU /平台-小端

Pretty much the same example: 几乎相同的示例：

Exactly the same answer, except that it is more likely -- but by no means certain -- that 4-byte alignment would be required for objects of type uint32_t . 答案完全相同，只是类型uint32_t对象更有可能（但不确定）是4字节对齐。

I know that 32bit variables should reside in address multiple of 4. 我知道32位变量应该驻留在4的地址倍数中。

Not necessarily. 不必要。 Some 32-bit platforms indeed do require it; 确实某些32位平台确实需要它。 some do not require it, but offer faster access for aligned objects; 有些不需要它，但是可以更快地访问对齐的对象； and some don't care at all. 有些根本不在乎。

Where I can find specification on this? 在哪里可以找到规范？

Such details of your C implementation of interest as are available at all would be found in that implementation's documentation. 您感兴趣的C实现的此类详细信息可以在该实现的文档中找到。 The underlying system's ABI and / or hardware documentation might serve as a secondary source. 基础系统的ABI和/或硬件文档可以用作辅助资源。

Overall, however, the best recommendation is usually to avoid such questions altogether. 总体而言，最好的建议通常是完全避免此类问题。 Avoiding unspecified, implentation-defined, and especially undefined behaviors would allow you to rely wholly on the C standard to predict the behavior of your program. 避免未指定的，未定义的，特别是未定义的行为，将使您完全依赖C标准来预测程序的行为。

Answer 3

8 bit MCU/MPU/Platform - Little endian 8位MCU / MPU /平台-小端

The answer will assume the platform, somehow, support longer integers even if the CPU might not, and that they are little-endian. 答案将假定该平台以某种方式支持更长的整数（即使CPU可能不支持），并且它们是低位优先的。

Note that, if a uC is truly 8-bit and has no notion of longer integers, then it does not make much sense to talk about its (byte) endianness. 请注意，如果uC确实是8位的，并且没有更长整数的概念，那么谈论它的（字节）字节顺序就没有多大意义。 We could say, for instance, that it is both little-endian and big-endian (or that it is not any of those). 例如，我们可以说它既是little-endian，又是big-endian（或者不是全部）。

 //assume &arr[0] == 0x0

This may be hinting that this is coming from some exercise about misaligned accesses. 这可能暗示这是来自一些有关未对齐访问的练习。

What is the value of *ui32 ? *ui32的值是*ui32 ？ 0x2030405 ? 0x2030405 ？ Is it necessary uint32_t variable to be placed to an address multiple of 4 at this platform? 是否需要在此平台uint32_t变量放置到4的地址倍数上？

It depends on the platform and on the options of the compiler (eg if the compiler is assuming strict aliasing, then this is undefined behaviour to begin with). 它取决于平台和编译器的选项（例如，如果编译器采用严格的别名，则从一开始就是未定义的行为）。

However, since this is a 8-bit platform (and assuming you tell the compiler to do what you seem to want to do), a fair guess is that uint32_t has to be supported in software and that unaligned accesses are not a problem. 但是，由于这是8位平台（并且假设您告诉编译器执行您想做的事情），因此可以合理地猜测到uint32_t必须在软件中得到支持，并且未对齐的访问不是问题。 Assuming the integer is kept in memory as little-endian (as explained above) by that software implementation, then yes, a good guess would be 0x05040302 . 假设该软件实现将整数作为低位字节序存储在内存中（如上所述），那么是的，一个很好的猜测将是0x05040302 。

32 bit MCU/MPU/Platform - Little endian What is the value of *ui32 ? 32位MCU / MPU / Platform-小字节序 *ui32的值是*ui32 ？

Again, in this case, it would depend on the platform/compiler. 同样，在这种情况下，它取决于平台/编译器。 In some of them, there wouldn't be any value even, since the CPU would trap when you try to read such an address (since &arr[0] == 0 , ui32 == 1 which is unaligned to eg 4). 在其中一些中，甚至没有任何值，因为当您尝试读取这样的地址时，CPU会陷阱（因为&arr[0] == 0 ， ui32 == 1 ，它未对齐例如4）。

I know that 32bit variables should reside in address multiple of 4. 我知道32位变量应该驻留在4的地址倍数中。

Typically, but depends on the platform. 通常，但取决于平台。 Also, even if a platform supports unaligned accesses, it may be the case that it is slower than aligned accesses (so you want them aligned anyhow). 另外，即使平台支持不对齐访问，也可能比对齐访问要慢（因此无论如何都希望它们对齐）。

Where I can find specification on this? 在哪里可以找到规范？

On top of the C specification, you would need to check your compiler's documentation and your architecture's manuals. 除了C规范之外，您还需要查看编译器的文档和体系结构的手册。

在8位平台或32位平台上将uint8_t的元素转换为32位变量

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-06-27 17:54:31

Language Lawyering 语言律师

Tl;dr 文艺青年最爱的

解决方案2
2 2018-06-27 18:07:32

解决方案3
0 2018-06-27 16:55:31

在8位平台或32位平台上将uint8_t的元素转换为32位变量

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-06-27 17:54:31

Language Lawyering 语言律师

Tl;dr 文艺青年最爱的

解决方案2 2 2018-06-27 18:07:32

解决方案3 0 2018-06-27 16:55:31

解决方案1
2 已采纳 2018-06-27 17:54:31

解决方案2
2 2018-06-27 18:07:32

解决方案3
0 2018-06-27 16:55:31