简体   繁体   中英

Union data structure alignment

I was working with some (of what I thought was) bad code that had a union like:

union my_msg_union
{
  struct message5;
  char buffer[256]
} message;

The buffer was filled with 256 bytes from comms. The struct is something like:

struct message5 {
 uint8 id;
 uint16 size;
 uint32 data;
 uint8 num_ids;
 uint16 ids[4];
} message5d

The same code was being compiled on heaps of architectures (8bit AVR, 16bit phillips, 32bit arm, 32bit x86 and amd64).

The problem I thought was the use of the union: The code just a blob of serial recieved bytes into the buffer, then reads the values out through the struct, without considering alignment/padding of the struct.

Sure enough, a quick look at sizeof(message5d) on different systems gave different results.

What surprised me however is that whenever the union with the char [] existed, all instances of all structs of that type, on all systems, dropped their padding/alignment, and made sure to be sequential bytes.

Is this a C standard or just something that compiler authors have put in to 'help'?

This code demonstrates the opposite behaviour from the one you describe:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>

struct message5
{
    uint8_t id;
    uint16_t size;
    uint32_t data;
    uint8_t num_ids;
    uint16_t ids[4];
};

#if !defined(NO_UNION)
union my_msg_union
{
    struct message5 msg;
    char buffer[256];
};
#endif /* NO_UNION */

struct data
{
    char const *name;
    size_t offset;
};

int main(void)
{
    struct data offsets[] =
    {
        { "message5.id", offsetof(struct message5, id) },
        { "message5.size", offsetof(struct message5, size) },
        { "message5.data", offsetof(struct message5, data) },
        { "message5.num_ids", offsetof(struct message5, num_ids) },
        { "message5.ids", offsetof(struct message5, ids) },
#if !defined(NO_UNION)
        { "my_msg_union.msg.id", offsetof(union my_msg_union, msg.id) },
        { "my_msg_union.msg.size", offsetof(union my_msg_union, msg.size) },
        { "my_msg_union.msg.data", offsetof(union my_msg_union, msg.data) },
        { "my_msg_union.msg.num_ids", offsetof(union my_msg_union, msg.num_ids) },
        { "my_msg_union.msg.ids", offsetof(union my_msg_union, msg.ids) },
#endif /* NO_UNION */
    };
    enum { NUM_OFFSETS = sizeof(offsets) / sizeof(offsets[0]) };

    for (size_t i = 0; i < NUM_OFFSETS; i++)
        printf("%-25s  %3zu\n", offsets[i].name, offsets[i].offset);
    return 0;
}

Sample output (GCC 4.8.2 on Mac OS X 10.9 Mavericks, 64-bit compilation):

message5.id                  0
message5.size                2
message5.data                4
message5.num_ids             8
message5.ids                10
my_msg_union.msg.id          0
my_msg_union.msg.size        2
my_msg_union.msg.data        4
my_msg_union.msg.num_ids     8
my_msg_union.msg.ids        10

The offsets within the union are the same as the offsets within the structure, as the C standard requires.

You would have to give a complete compiling counter-example based on the code above, and specify which compiler and platform you are compiling on to get your deviant answer — if indeed you can reproduce the deviant answer.

I note that I had to change uint8 etc to uint8_t , but I don't think that makes any difference. If it does, you need to specify which header you get the names like uint8 from.


Code updated to be compilable with or without union . Output when compiled with -DNO_UNION :

message5.id                  0
message5.size                2
message5.data                4
message5.num_ids             8
message5.ids                10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM