简体   繁体   English

C / C ++获得结构大小

[英]C/C++ getting struct size

Today, with my great surprise, I discovered that 今天,我惊讶地发现了这一点

When the sizeof operator is applied to a class, struct, or union type, the result is the number of bytes in an object of that type, plus any padding added to align members on word boundaries. 当sizeof运算符应用于类,结构或联合类型时,结果是该类型的对象中的字节数,以及为在单词边界上对齐成员而添加的任何填充。 The result does not necessarily correspond to the size calculated by adding the storage requirements of the individual members. 结果不一定与通过添加各个成员的存储要求而计算的大小相对应。

I didn't know of it, and am pretty sure this thing is breaking some of my old code: to read binary files I used to have structs like this one: 我不知道它,我很确定这个东西打破了我的一些旧代码:读取二进制文件,我曾经有这样的结构:

struct Header
{
    union {
        char identc[4];
        uint32 ident;
    };
    uint16 version;
};

and to read those 6 bytes directly with fread driven by sizeof : 并使用sizeof驱动的fread直接读取这6个字节:

fread( &header, sizeof(header), 1, f );

But now sizeof(header) returns 8 ! 但是现在sizeof(header)返回8


Is it possible that with older GCC versions sizeof(header) returned 6 , or I my mind is totally gone? 是否有可能使用较旧的GCC版本sizeof(header)返回6 ,或者我的想法完全消失了?

Anyway is there any other operator (or preprocessor directive or whatever) that lets the compiler know how big the structs is -- excluding padding? 无论如何是否有任何其他运算符(或预处理器指令或其他)让编译器知道结构有多大 - 不包括填充?

Otherwise what would be a clean way to read a raw-data struct from a file that doesn't require to write too much code? 否则,从一个不需要编写太多代码的文件中读取原始数据结构的干净方法是什么?


EDIT : I know that this isn't the correct way to read/write binary data: I'd have different result depending on machine endianess and stuff. 编辑 :我知道这不是读取/写入二进制数据的正确方法:根据机器的字节顺序和内容,我会得到不同的结果。 Anyway this method is the fastest one, I'm juist trying to read some binary data to quickly get its content, not to write a good application which I'm going to use in future or to release. 无论如何,这种方法是最快的方法,我是juist试图读取一些二进制数据以快速获取其内容,而不是编写一个我将在未来使用或发布的好应用程序。

What you want is the #pragma pack command. 你想要的是#pragma pack命令。 This allows you to set the packing to any amount you want. 这允许您将包装设置为您想要的任何数量。 Typically you would set the packing value to 1 (or is it 0? ) before your structure definition and then return it to the default value after the definition. 通常,您会在结构定义之前将打包值设置为1(或者是0?),然后在定义之后将其返回到默认值。

Note that this does not do anything to guarantee portability between systems. 请注意,这并不能保证系统之间的可移植性。

See also: use-of-pragma-in-c and various other questions on SO 另请参阅: 使用-pragma-in-c以及关于SO的各种其他问题

Yes the code you presented isn't portable. 是的,您提供的代码不可移植。 Not only structure sizes but also byte orders might differ. 不仅结构大小而且字节顺序可能不同。

This is not the correct way to process binary files. 这不是处理二进制文件的正确方法。 Aside from alignment issues, it also has endian issues. 除了对齐问题外,它还存在字节序问题。 The proper way to read binary files is with an array of uint8_t (or unsigned char , it really doesn't matter) and your own functions to built an in-memory representation out of the data. 读取二进制文件的正确方法是使用uint8_t (或unsigned char ,它确实无关紧要)和您自己的函数来构建数据中的内存表示。

Most compiles provide for a specific extension that allows you to control the packing of structs. 大多数编译提供了一个特定的扩展,允许您控制结构的打包。 This should allow you to control it. 这应该允许你控制它。 However, when you write the struct in binary, you should be able to just write it and read it regardless of packing, as when you write the struct, it should also write sizeof(struct) bytes. 但是,当您以二进制形式编写结构时,您应该能够只编写并读取它而不管打包,就像编写结构时一样,它也应该写sizeof(struct)字节。 The only case where this would be a trouble is if you wanted to read files created with the previous versions. 如果您想要读取使用以前版本创建的文件,那么这将是一个麻烦的唯一情况。 Also, you need to consider byte-order issues, etc. 此外,您需要考虑字节顺序问题等。

Your question is compiler specific, but generally if you build your structure such that each member lies on a boundary of the same size as itself (four byte elements on boundaries divisible by four, etc.), you'll get the behavior you want. 您的问题是特定于编译器的,但通常如果您构建结构使得每个成员位于与其自身相同大小的边界上(可被4整除的边界上的四个字节元素等),您将获得所需的行为。 Watch also for cases like the one you presented where padding comes at the end of a structure to align the start of the first element of the next structure--if they were laid out in an array. 还要注意像你所呈现的那样的情况,其中填充位于结构的末端,以对齐下一个结构的第一个元素的开始 - 如果它们是以数组布局的。

It seems that you havn'tactually asked a question so I'm not sure why I am even trying to answer! 看来你有问题,所以我不确定为什么我甚至都想回答! But yes, packing is important and will change depending on compiler versions, flags, target architecture pragmas, wind direction, phases of the moon and potentially many other things. 但是,包装很重要,并且会根据编译器版本,标志,目标架构编译指示,风向,月球相位以及可能的许多其他内容而发生变化。 Dumping binary to a file (or socket) is not a very good way of serializing anything. 将二进制文件转储到文件(或套接字)并不是一种很好的序列化方法。

This extra padding is necessary to get the members aligned properly when you create an array of these structures. 当您创建这些结构的数组时,这个额外的填充是使成员正确对齐所必需的。 Without it, the 2nd element of the array would have the ident member aligned on an address that's not a multiple of 4. 没有它,数组的第二个元素将使ident成员在一个不是4的倍数的地址上对齐。

It is probably too late to do anything about it, you probably wrote files with this structure before. 对它做任何事情都可能为时已晚,你可能以前用这种结构编写了文件。 Changing the packing will make these files unreadable. 更改打包将使这些文件不可读。 But, yes, having file data that's dependent on compiler settings isn't the greatest idea. 但是,是的,拥有依赖于编译器设置的文件数据并不是最好的主意。 Having data stored in a human-readable format is common these days. 如今,将数据以人类可读的格式存储是很常见的。 Neither the disk bytes nor the CPU cycles are worth it. 磁盘字节和CPU周期都不值得。

Yes, the alignment problem. 是的,对齐问题。 That is why internet protocol messages have aligned structs so that this problem can be avoided when sending data over the network. 这就是互联网协议消息具有对齐结构的原因,以便在通过网络发送数据时可以避免此问题。

What you can do is either fix your structs so that they are aligned properly, or have marshalling functions that you use when saving and retrieving data. 您可以做的是修复结构以使它们正确对齐,或者具有在保存和检索数据时使用的编组功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM