简体   繁体   中英

Will a C union of uint32_t and uint8_t[4] will always map the same way on little endian architectures?

Will a C union of uint32_t and uint8_t[4] will always map the same way on little endian architectures?

eg with

union {
    uint32_t double_word;
    uint8_t octets[4];
} u;

will

u.double_word = 0x12345678;

always result in:

u.octets[0] == 0x78
u.octets[1] == 0x56
u.octets[2] == 0x34
u.octets[3] == 0x12

or is this undefined behaviour?

TL;DR: Yes, the code is fine.

As noted, it contains implementation-defined behavior depending on endianess, but other than that, the behavior is well-defined and the code is portable (between little endian machines).


Detailed answer:

One thing that's important is that the order of allocation of an array is guaranteed, C11 6.2.5/20:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type.

This means that the array of 4 uint8_t is guaranteed to follow the allocation order of the uint32_t , which on a little endian system means least significant byte first.

In theory, the compiler is however free to toss in any padding at the end of a union (C11 6.7.2.1/17), but that shouldn't affect the data representation. If you want to pedantically protect against this - or more relevantly, you wish to protect against an issue in case more members are added later - you can add a compile-time assert:

typedef union {
    uint32_t double_word;
    uint8_t octets[4];
} u;

_Static_assert(sizeof(u) == sizeof(uint32_t), "union u: Padding detected");

As for the representation of the uintn_t types, it is guaranteed to be 2's complement (in case of signed types) with no padding bits (C11 7.20.1.1).

And finally, the issue about whether "type punning" through a union is allowed or undefined behavior, this is specified a bit vaguely in C11 6.5.2.3:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member, 95) and is an lvalue if the first expression is an lvalue.

Where the (non-normative) note 95 provides clarification:

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ''type punning''). This might be a trap representation.

And since we already ruled out padding bits, trap representations is not an issue.

On a platform that actually has both of these types, C11 §7.20.1.1 p2 gives you all the needed guarantees (given you know endianness):

The typedef name uintN_t designates an unsigned integer type with width N and no padding bits. Thus, uint24_t denotes such an unsigned integer type with a width of exactly 24 bits.

This is enough because there are no bytes with fewer than 8 bits, so having uint8_t available automatically means that a byte has exactly 8 bits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM