简体   繁体   English

确定结构大小,忽略填充

[英]Determine struct size, ignoring padding

I receive datagrams through a network and I would like to copy the data to a struct with the appropirate fields (corresponding to the format of the message). 我通过网络接收数据报,我想将数据复制到具有适当字段(对应于消息格式)的结构中。 There are many different types of datagrams (with different fields and size). 有许多不同类型的数据报(具有不同的字段和大小)。 Here is a simplified version (in reality the fields are always arrays of chars): 这是一个简化的版本(实际上,字段始终是字符数组):

struct dg_a
{
    char id[2];
    char time[4];
    char flags;

    char end;
};

struct dg_a data;
memcpy(&data, buffer, offsetof(struct dg_a, end));

Currently I add a dummy field called end to the end of the struct so that I can use offsetof to determine how many bytes to copy. 目前,我在结构的末尾添加了一个名为end的虚拟字段,以便可以使用offsetof确定要复制的字节数。

Is there a better and less error-prone way to do this? 是否有更好且更少出错的方法来做到这一点? I was looking for something more portable than putting __attribute__((packed)) and using sizeof . 我正在寻找比__attribute__((packed))和使用sizeof更可移植的东西。

-- -

EDIT 编辑

Several people in the comments had stated that my approach is bad, but so far nobody has presented a reason why this is. 评论中有几个人说我的方法不好,但是到目前为止,没有人提出这样做​​的原因。 Since struct members are char , there are no trap representations and no paddings between the members (guaranteed by the standard). 由于struct成员是char ,因此成员之间没有陷阱表示,也没有填充(由标准保证)。

A central issue is the size of buffer (assumed to be a character array). 一个中心问题是buffer的大小(假定为字符数组)。 The 2 below copy, perhaps a few byte difference. 下面的2个副本,也许有几个字节的差异。

memcpy(&data, buffer, offsetof(struct dg_a, end));  // 7 
// or
memcpy(&data, buffer, sizeof data);                 // 7, 8, 16 depends on alignment.

Consider avoiding those issues and use buffer as wide as any data structure and zero filled/padded prior to being populated with incoming data. 考虑避免这些问题,并在填充传入数据之前,使用与任何数据结构一样宽的buffer ,并填充/填充零。

struct dg_a {
    char id[2];
    char time[4];
    char flags;
}; // no end field

union dg_all {
 struct dg_a a;
 struct dg_b b;
 ... 
 struct dg_z z;
} buffer = { 0 };

foo(&buffer, sizeof buffer); // get data

switch (bar(buffer)) {
  case `a` {
    struct dg_a data = buffer.a;  // Ditch the memcpy
    // or maybe no need for copy, just use `buffer.a`

If the term "language" refers to a mapping between source text and behavior, the name C describes two families of languages: 如果术语“语言”是指源文本和行为之间的映射,则名称C描述了两种语言:

  1. The family of languages which mapped "C syntax" to the behaviors of commonplace microcomputer hardware in ways which were defined more by precedent than specification, but were essentially 100% consistent throughout the 1980s and most of the 1990s among implementations targeting commonplace hardware. 这些语言将“ C语法”映射到普通微型计算机硬件的行为,其方式更多地是通过先例而不是规范来定义的,但是在针对普通硬件的各种实现中,整个1980年代和1990年代的大部分时间都是100%一致的。

  2. The family of all languages that meet the C Specification, including those processed by deliberately-capricious implementations. 符合C规范的所有语言的族,包括由故意反复执行的实现处理的语言。

Even though the authors of the C Standard recognized that it would not be practical to mandate that all implementations be suitable for all of the purposes served by C programs, a mentality has emerged in some fields that the only programs that should be considered "portable" are those which the Standard requires all implementations to support. 即使C标准的作者认识到,强制要求所有实现都适合C程序所服务的所有目的并不可行,但在某些领域已经出现了一种心态,即仅应将这些程序视为“便携式”程序。是标准要求所有实施支持的标准。 A program which could be broken by a deliberately-capricious implementation should (given that mentality) be viewed as "non-portable" or "erroneous", even if it would benefit greatly from semantics which compilers for commonplace hardware had unanimously supported during the late 20th century, and for which the Standard defines no nice replacements. 可能会因故意改变的实现而被破坏的程序(考虑到心态)应被视为“不可移植”或“错误”的,即使它将从后期通用支持的通用硬件编译器的语义中受益匪浅20世纪,对于该标准,没有定义好的替代品。

Because compilers targeting certain fields like high-end number crunching can benefit from assuming that code won't rely upon certain hardware features, and because the authors of the Standard didn't want to get into details of deciding what implementations should be regarded as suitable for what purposes, some compiler writers really don't want to support code which attempts to overlay data onto structures. 因为针对高端字段运算等特定领域的编译器可以从假定代码不依赖某些硬件功能的情况中受益,并且由于该标准的作者不想深入探讨确定哪些实现应被视为合适的细节。出于什么目的,一些编译器作者真的不想支持试图将数据覆盖到结构上的代码。 Such constructs may be more readable than code which tries to manually parse all the data, and compilers that endeavor to support such code may be able to process it more easily and efficiently than code which manually parses all the data, but since the Standard would allow compilers to assign struct layouts in silly ways if they chose to do so, compiler writers have a mentality that any code which tries to overlay data onto structures should be considered defective. 这样的构造可能比尝试手动解析所有数据的代码更具可读性,并且比起手动解析所有数据的代码,致力于支持此类代码的编译器可能更容易,更高效地对其进行处理,但是由于该标准允许如果编译器选择这样做,则会以愚蠢的方式分配结构布局,因此,编译器编写者会想到,任何试图将数据覆盖到结构上的代码都应视为有缺陷的。

C has no standard mechanism for avoiding padding between structure elements or at the end of the structure. C没有标准的机制来避免结构元素之间或结构末端的填充。 Many implementations provide such a thing as an extension, however, and inasmuch as you seem to want to match structure layout to network message payloads, your only alternative is to rely on such an extension. 但是,许多实现都提供了扩展功能,并且由于您似乎想将结构布局与网络消息有效负载进行匹配,因此唯一的选择就是依靠这种扩展功能。

Although using __attribute__((packed)) or a work-alike will enable you to use sizeof for your purpose, that's just a bonus. 尽管使用__attribute__((packed))或类似的工具可以使您使用sizeof达到目的,但这只是一个奖励。 The main point of doing so is to match the structure layout to the network message structure for the benefit of your proposed memory copying. 这样做的主要目的是使结构布局与网络消息结构匹配,以实现建议的内存复制。 If the structure is laid out with internal padding where the protocol message has none, then a direct, whole-message copy such as you propose simply cannot work. 如果该结构使用内部填充进行布局,而协议消息则没有,则直接的完整消息副本(如您建议的那样)根本无法工作。 That sizeof otherwise does not give you the correct size is only a symptom of the larger problem. 否则, sizeof并不能为您提供正确的大小,这只是更大问题的征兆。

Note also that you may face other issues with copying raw bytes, too. 还请注意,复制原始字节也可能会遇到其他问题。 In particular, if you intend to exchange messages between machines with different architectures, and these message contain integers larger than one byte, then you need to account for byte-order differences. 特别是,如果您打算在具有不同体系结构的计算机之间交换消息,并且这些消息包含大于一个字节的整数,则需要考虑字节顺序的差异。 If the protocol is well designed, then it in fact specifies byte order. 如果协议设计合理,则实际上可以指定字节顺序。 Similarly, if you're passing around character data then you may need to deal with encoding issues (which may themselves have have their own byte-ordering considerations). 同样,如果要传递字符数据,则可能需要处理编码问题(它们本身可能具有自己的字节顺序注意事项)。

Overall, you are unlikely to be able to build a robust, portable protocol implementation based on copying whole message payloads into corresponding structures, all at once. 总体而言,基于一次将整个消息有效负载复制到相应的结构中,您不太可能构建健壮的可移植协议实现。 At minimum, you would likely need to perform message-type-specific fixup after the main copy. 至少,您可能需要在主副本之后执行特定于消息类型的修复。 I recommend instead biting the bullet and writing appropriate marshalling functions for each message type into and out of the corresponding network representation. 我建议改为硬着头皮,为每种消息类型在相应的网络表示形式内外编写适当的编组函数。 You'll more easily make this portable. 您将更轻松地使其便携。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM