简体   繁体   中英

Determine struct size, ignoring padding

I receive datagrams through a network and I would like to copy the data to a struct with the appropirate fields (corresponding to the format of the message). There are many different types of datagrams (with different fields and size). Here is a simplified version (in reality the fields are always arrays of chars):

struct dg_a
{
    char id[2];
    char time[4];
    char flags;

    char end;
};

struct dg_a data;
memcpy(&data, buffer, offsetof(struct dg_a, end));

Currently I add a dummy field called end to the end of the struct so that I can use offsetof to determine how many bytes to copy.

Is there a better and less error-prone way to do this? I was looking for something more portable than putting __attribute__((packed)) and using sizeof .

--

EDIT

Several people in the comments had stated that my approach is bad, but so far nobody has presented a reason why this is. Since struct members are char , there are no trap representations and no paddings between the members (guaranteed by the standard).

A central issue is the size of buffer (assumed to be a character array). The 2 below copy, perhaps a few byte difference.

memcpy(&data, buffer, offsetof(struct dg_a, end));  // 7 
// or
memcpy(&data, buffer, sizeof data);                 // 7, 8, 16 depends on alignment.

Consider avoiding those issues and use buffer as wide as any data structure and zero filled/padded prior to being populated with incoming data.

struct dg_a {
    char id[2];
    char time[4];
    char flags;
}; // no end field

union dg_all {
 struct dg_a a;
 struct dg_b b;
 ... 
 struct dg_z z;
} buffer = { 0 };

foo(&buffer, sizeof buffer); // get data

switch (bar(buffer)) {
  case `a` {
    struct dg_a data = buffer.a;  // Ditch the memcpy
    // or maybe no need for copy, just use `buffer.a`

If the term "language" refers to a mapping between source text and behavior, the name C describes two families of languages:

  1. The family of languages which mapped "C syntax" to the behaviors of commonplace microcomputer hardware in ways which were defined more by precedent than specification, but were essentially 100% consistent throughout the 1980s and most of the 1990s among implementations targeting commonplace hardware.

  2. The family of all languages that meet the C Specification, including those processed by deliberately-capricious implementations.

Even though the authors of the C Standard recognized that it would not be practical to mandate that all implementations be suitable for all of the purposes served by C programs, a mentality has emerged in some fields that the only programs that should be considered "portable" are those which the Standard requires all implementations to support. A program which could be broken by a deliberately-capricious implementation should (given that mentality) be viewed as "non-portable" or "erroneous", even if it would benefit greatly from semantics which compilers for commonplace hardware had unanimously supported during the late 20th century, and for which the Standard defines no nice replacements.

Because compilers targeting certain fields like high-end number crunching can benefit from assuming that code won't rely upon certain hardware features, and because the authors of the Standard didn't want to get into details of deciding what implementations should be regarded as suitable for what purposes, some compiler writers really don't want to support code which attempts to overlay data onto structures. Such constructs may be more readable than code which tries to manually parse all the data, and compilers that endeavor to support such code may be able to process it more easily and efficiently than code which manually parses all the data, but since the Standard would allow compilers to assign struct layouts in silly ways if they chose to do so, compiler writers have a mentality that any code which tries to overlay data onto structures should be considered defective.

C has no standard mechanism for avoiding padding between structure elements or at the end of the structure. Many implementations provide such a thing as an extension, however, and inasmuch as you seem to want to match structure layout to network message payloads, your only alternative is to rely on such an extension.

Although using __attribute__((packed)) or a work-alike will enable you to use sizeof for your purpose, that's just a bonus. The main point of doing so is to match the structure layout to the network message structure for the benefit of your proposed memory copying. If the structure is laid out with internal padding where the protocol message has none, then a direct, whole-message copy such as you propose simply cannot work. That sizeof otherwise does not give you the correct size is only a symptom of the larger problem.

Note also that you may face other issues with copying raw bytes, too. In particular, if you intend to exchange messages between machines with different architectures, and these message contain integers larger than one byte, then you need to account for byte-order differences. If the protocol is well designed, then it in fact specifies byte order. Similarly, if you're passing around character data then you may need to deal with encoding issues (which may themselves have have their own byte-ordering considerations).

Overall, you are unlikely to be able to build a robust, portable protocol implementation based on copying whole message payloads into corresponding structures, all at once. At minimum, you would likely need to perform message-type-specific fixup after the main copy. I recommend instead biting the bullet and writing appropriate marshalling functions for each message type into and out of the corresponding network representation. You'll more easily make this portable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM