使用强制转换来访问像结构一样的字节数组？

Question

I'm working on a microcontroller-based software project. 我正在研究基于微控制器的软件项目。 A part of the project is a parser for a binary protocol. 该项目的一部分是二进制协议的解析器。 The protocol is fixed and cannot be changed. 协议是固定的，不能更改。 A PC is acting as a "master" and mainly transmits commands, which have to be executed by the "slave", the microcontroller board. PC充当“主设备”并主要发送命令，这些命令必须由“从设备”即微控制器板执行。

The protocol data is received by a hardware communication interface, eg UART, CAN or Ethernet. 协议数据由硬件通信接口接收，例如UART，CAN或以太网。 That's not the problem. 那不是问题。

After all bytes of a frame (4 - 10, depending on the command) are received, they are stored in a buffer of type "uint8_t cmdBuffer[10]" and a flag is set, indicating that the command can now be executed. 在接收到帧的所有字节（4-10，取决于命令）之后，将它们存储在“uint8_t cmdBuffer [10]”类型的缓冲区中并设置标志，表示现在可以执行该命令。 The first byte of a frame (cmdBuffer[0]) contains the command, the rest of the frame are parameters for the command, which may differ in number and size, depending on the command. 帧的第一个字节（cmdBuffer [0]）包含命令，帧的其余部分是命令的参数，根据命令的不同，数字和大小可能不同。 This means, the payload can be interpreted in many ways. 这意味着，有效载荷可以通过多种方式进行解释。 For every possible command, the data bytes change their meaning. 对于每个可能的命令，数据字节都会改变它们的含义。

I don't want to have much ugly bit operations, but self-documentating code. 我不希望有太多丑陋的操作，而是自我记录代码。 So my approach is: 所以我的方法是：

I create a "typedef struct" for each command 我为每个命令创建了一个“typedef struct”
After determining the command, the pointer to the cmdBuffer is casted to a pointer of my new typedef 确定命令后，指向cmdBuffer的指针将转换为我的新typedef的指针
by doing so, I can access the command's parameters as structure members, avoiding magic numbers in array acces, bit operations for parameters > 8 bit, and it is easier to read 通过这样做，我可以访问命令的参数作为结构成员，避免数组访问中的幻数，参数的位操作> 8位，并且它更容易阅读

Example: 例：

typedef struct
{
    uint8_t commandCode;
    uint8_t parameter_1;
    uint32_t anotherParameter;
    uint16 oneMoreParameter;
}payloadA_t;

//typedefs for payloadB_t and payloadC_t, which may have different parameters

void parseProtocolData(uint8_t *data, uint8_t length)
{
  uint8_t commandToExecute;

  //this, in fact, just returns data[0]
  commandToExecute = getPayloadType(data, length);

  if (commandToExecute == COMMAND_A)
  {
    executeCommand_A( (payloadA_t *) data);
  }
  else if (commandToExecute == COMMAND_B)
  {
    executeCommand_B( (payloadB_t *) data);
  }
  else if (commandToExecute == COMMAND_C)
  {
    executeCommand_C( (payloadC_t *) data);
  }
  else
  {
    //error, unknown command
  }
}

I see two problems with this: 我看到两个问题：

First, depending on the microcontroller architecture, the byteorder may be intel or motorola for 2 or 4- byte parameters. 首先，根据微控制器架构，字节顺序可以是2或4字节参数的intel或motorola。 This should not be much problem. 这应该不是什么大问题。 The protocol itself uses network byte order. 协议本身使用网络字节顺序。 On the target controller, a macro can be used for correcting the order. 在目标控制器上，可以使用宏来更正顺序。
The major problem: there may be padding bytes in my tyepdef struct. 主要问题：我的tyepdef结构中可能有填充字节。 I'm using gcc, so i can just add a "packed"-attribute to my typedef. 我正在使用gcc，所以我可以在我的typedef中添加一个“packed”属性。 Other compilers provide pragmas for this. 其他编译器为此提供了编译指示。 However, on 32-bit machines, packed structures will result in bigger (and slower) machine code. 但是，在32位机器上，打包结构将导致更大（和更慢）的机器代码。 Ok, this may also be not a problem. 好吧，这也可能不是问题。 But I'v heard, there can be a hardware fault when accessing un-aligned memory (on ARM architecture, for example). 但是我听说，访问未对齐的内存时可能会出现硬件故障（例如，在ARM体系结构上）。

There are many commands (around 50), so I don't want access the cmdBuffer as an array I think the "structure approach" increases code readability in contrast to the "array approach" 有很多命令（大约50个），所以我不想访问cmdBuffer作为一个数组我认为“结构方法”增加了代码的可读性，与“数组方法”相反

So my questions: 所以我的问题：

Is this approach OK, or is it just a dirty hack? 这种方法可以，还是只是一个肮脏的黑客？
are there cases where the compiler can rely on the "strict aliasing rule" and make my approach not work? 有没有编译器可以依赖“严格别名规则”并使我的方法不起作用的情况？
Is there a better solution? 有更好的解决方案吗？ How would you solve this problem? 你会如何解决这个问题？
Can this be kept, at least a little, portable? 这可以保持，至少一点，便携？

Regards, lugge 问候，lugge

Answer 1

Generally, structs are dangerous for storing data protocols because of padding. 通常，由于填充，结构对于存储数据协议是危险的。 For portable code, you probably want to avoid them. 对于可移植代码，您可能希望避免使用它们。 Keeping the raw array of data is therefore the best idea still. 因此，保持原始数据阵列仍然是最好的主意。 You only need a way to interpret it differently depending on the received command. 您只需要根据收到的命令以不同的方式解释它。

This scenario is a typical example where some kind of polymorphism is desired. 这种情况是需要某种多态性的典型示例。 Unfortunately, C has no built-in support for that OO feature, so you'll have to create it yourself. 不幸的是，C没有对该OO功能的内置支持，因此您必须自己创建它。

The best way to do this depends on the nature of these different kinds of data. 最好的方法取决于这些不同类型数据的性质。 Since I don't know that, I can only suggest on such way, it may or may not be optimal for your specific case: 由于我不知道，我只能以这种方式提出建议，它可能适用于您的具体情况，也可能不是最佳选择：

typedef enum
{
  COMMAND_THIS,
  COMMAND_THAT,
  ... // all 50 commands

  COMMANDS_N // a constant which is equal to the number of commands
} cmd_index_t;


typedef struct
{
  uint8_t      command;        // the original command, can be anything
  cmd_index_t  index;          // a number 0 to 49
  uint8_t      data [MAX];     // the raw data
} cmd_t;

Step one would then be that upon receiving a command, you translate it to the corresponding index. 然后，第一步是在收到命令后，将其转换为相应的索引。

// ...receive data and place it in cmdBuffer[10], then:
cmd_t cmd;
cmd_create(&cmd, cmdBuffer[0], &cmdBuffer[1]);

...

void cmd_create (cmd_t* cmd, uint8_t command, uint8_t* data)
{
   cmd->command = command;
   memcpy(cmd->data, data, MAX);

   switch(command)
   {
     case THIS: cmd->index = COMMAND_THIS; break;
     case THAT: cmd->index = COMMAND_THAT; break;
     ... 
   }
}

Once you have an index 0 to N means that you can implement lookup tables. 一旦索引0到N意味着您可以实现查找表。 Each such lookup table can be an array of function pointers, which determine the specific interpretation of the data. 每个这样的查找表可以是函数指针的数组，其确定数据的特定解释。 For example: 例如：

typedef void (*interpreter_func_t)(uint8_t* data);

const interpreter_func_t interpret [COMMANDS_N] =
{
  &interpret_this_command,
  &interpret_that_command,
  ...
};

Use: 使用：

interpret[cmd->index] (cmd->data);

Then you can make similar lookup tables for different tasks. 然后，您可以为不同的任务创建类似的查找表。

   initialize [cmd->index] (cmd->data);
   interpret  [cmd->index] (cmd->data);
   repackage  [cmd->index] (cmd->data);
   do_stuff   [cmd->index] (cmd->data);
   ...

Use different lookup tables for different architectures. 对不同的体系结构使用不同的查找表。 Things like endianess can be handled inside the interpreter functions. 像endianess这样的东西可以在解释器函数中处理。 And you can of course change the function prototypes, maybe you need to return something or pass more parameters etc. 您当然可以更改函数原型，也许您需要返回一些内容或传递更多参数等。

Note that the above example is most suitable when all commands result in the same kind of actions. 请注意，上述示例最适合所有命令导致相同类型的操作。 If you need to do entirely different things depending on command, other approaches are more suitable. 如果你需要根据命令做完全不同的事情，其他方法更合适。

Answer 2

IMHO it is a dirty hack. 恕我直言，这是一个肮脏的黑客。 The code may break when ported to a system with different alignment requirements, different variable sizes, different type representations (eg big endian / little endian). 当移植到具有不同对齐要求，不同变量大小，不同类型表示（例如，大端/小端）的系统时，代码可能会中断。 Or even on the same system but different version of compiler / system headers / whatever. 或者甚至在相同的系统上但不同版本的编译器/系统头/无论如何。

I don't think it violates strict aliasing, so long as the relevant bytes form a valid representation. 我不认为它违反严格的别名，只要相关的字节形成有效的表示。

I would just write code to read the data in a well-defined manner, eg 我只想编写代码以明确定义的方式读取数据，例如

bool extract_A(PayloadA_t *out, uint8_t const *in)
{
    out->foo = in[0];
    out->bar = read_uint32(in + 1, 4);
    // ...
}

This may run slightly slower than the "hack" version, it depends on your requirements whether you prefer maintenance headaches, or those extra microseconds. 这可能比“hack”版本运行稍慢，它取决于您的要求是否更喜欢维护头痛，或者那些额外的微秒。

Answer 3

Answering your questions in the same order: 以相同的顺序回答您的问题：

This approach is quite common, but it's still called a dirty hack by any book I know that mentions this technique. 这种方法很常见，但我知道提到这种技术的任何一本书仍然被称为肮脏的黑客。 You spelled the reasons out yourself: in essence it's highly unportable or requires a lot of preprocessor magic to make it portable. 你自己拼出了这些理由：实质上它是高度不可移植的，或者需要大量的预处理器魔法才能使它变得便携。
strict aliasing rule: see the top voted answer for What is the strict aliasing rule? 严格别名规则：请参阅最高投票答案什么是严格别名规则？
The only alternative solution I know is to explicitly code the deserialization as you mentioned yourself. 我所知道的唯一替代解决方案是如您自己提到的那样明确编写反序列化代码。 This can actually be made very readable like this: 这实际上可以像这样非常易读：

uint8_t *p = buffer; struct s; s.field1 = read_u32(&p); s.field2 = read_u16(&p);

IE I would make the read functions move the pointer forward by the number of deserialized bytes. IE我将使读取函数将指针向前移动反序列化字节的数量。

As said above, you can use the preprocessor to handle different endianness and struct packing. 如上所述，您可以使用预处理器来处理不同的字节序和结构打包。

Answer 4

It's a dirty hack. 这是一个肮脏的黑客。 The biggest problem I see with this solution is memory alignment rather than endianness or struct packing. 我在这个解决方案中遇到的最大问题是内存对齐而不是字节顺序或结构打包。

The memory alignment issue is this. 内存对齐问题是这样的。 Some microcontrollers such as ARM require that multi-byte variables be aligned with certain memory offsets. 某些微控制器（如ARM）要求多字节变量与某些存储器偏移对齐。 That is, 2-byte half-words must be aligned on even memory addresses. 也就是说，2字节半字必须在偶数存储器地址上对齐。 And 4-byte words must be aligned on memory addresses that are multiples of 4. These alignment rules are not enforced by your serial protocol. 并且4字节字必须在4的倍数的存储器地址上对齐。这些对齐规则不是由串行协议强制执行的。 So if you simply cast the serial data buffer into a packed structure then the individual structure members may not have the proper alignment. 因此，如果您只是将串行数据缓冲区转换为打包结构，则各个结构成员可能没有正确的对齐方式。 Then when your code tries to access a misaligned member it will result in an alignment fault or undefined behavior. 然后，当您的代码尝试访问未对齐的成员时，将导致对齐错误或未定义的行为。 (This is why the compiler creates an un-packed structure by default.) （这就是编译器默认创建未打包结构的原因。）

Regarding endianness, it sounds like your proposing to correct the byte-order when your code accesses the member in the packed structure. 关于字节顺序，听起来你提议在代码访问压缩结构中的成员时更正字节顺序。 If your code accesses the packed structure member multiple times then it will have to correct the endianness every time. 如果您的代码多次访问压缩结构成员，则每次都必须更正字节顺序。 It would be more efficient to just correct the endianness once, when the data is first received from the serial port. 当首次从串行端口接收数据时，仅更正字节序一次会更有效。 And this is another reason not to simply cast the data buffer into a packed structure. 这是不简单地将数据缓冲区转换为压缩结构的另一个原因。

When you receive the command, you should parse out each field individually into an unpacked structure where each member is properly aligned and has the proper endianness. 当您收到命令时，您应该将每个字段分别解析为一个解压缩的结构，其中每个成员都正确对齐并具有正确的字节顺序。 Then your microcontroller code can access each member most efficiently. 然后您的微控制器代码可以最有效地访问每个成员 This solution is also more portable if done correctly. 如果正确完成，此解决方案也更便于携带。

Answer 5

Yes this is the problem of memory alignment. 是的，这是内存对齐的问题。

Which controller you are using ? 你在使用哪种控制器？

Just declare the structure along with following syntax, 只需声明结构以及以下语法，

__attribute__(packed)

may be it will solve your problem. 可能会解决你的问题。

Or you can try to access the variable as reference by address instead of reference by value. 或者您可以尝试通过地址而不是按值引用来访问变量作为引用。

使用强制转换来访问像结构一样的字节数组？

问题描述

5 个解决方案

解决方案1
3 2014-04-25 06:59:04

解决方案2
2 2014-04-25 06:43:39

解决方案3
1 2014-04-25 06:52:48

解决方案4
1 2014-04-25 14:19:43

解决方案5
0 2014-04-25 07:09:56

使用强制转换来访问像结构一样的字节数组？

问题描述

5 个解决方案

解决方案1 3 2014-04-25 06:59:04

解决方案2 2 2014-04-25 06:43:39

解决方案3 1 2014-04-25 06:52:48

解决方案4 1 2014-04-25 14:19:43

解决方案5 0 2014-04-25 07:09:56

解决方案1
3 2014-04-25 06:59:04

解决方案2
2 2014-04-25 06:43:39

解决方案3
1 2014-04-25 06:52:48

解决方案4
1 2014-04-25 14:19:43

解决方案5
0 2014-04-25 07:09:56