简体   繁体   English

C++ 如何在将原始数据转换为 class 对象时处理 alignment 填充

[英]C++ how to handle alignment padding when casting raw data to class objects

I read big sections of a file as "blobs" of data to char arrays.我将文件的大部分读取为char arrays 的数据“blob”。 I know how these blobs are structured, and have created classes for the different structures.我知道这些 blob 的结构,并为不同的结构创建了类。 Then I want to cast the read char arrays to arrays of appropriate class objects.然后我想将读取的char arrays 转换为相应 class 对象的 arrays 。

This has worked well for certain cases, but I have gotten to a case where alignment / padding of the class members is an issue.这在某些情况下效果很好,但我遇到了 alignment / class 成员的填充是一个问题的情况。

Here is a minimal example, but instead of getting data from a file, I define the data in data_i1 , data_d1 and data_i2 , then cast it to c_data .这是一个最小的示例,但我没有从文件中获取数据,而是在data_i1data_d1data_i2中定义数据,然后将其转换为c_data c_data represents the data read from the file and contains data_i1 , data_d1 and data_i2 twice. c_data表示从文件中读取的数据,包含data_i1data_d1data_i2两次。

Without alignment being a problem, if I cast c_data to and array of Data , I should get the initial data in Data[0] and Data[1] .如果 alignment 没有问题,如果我将c_data转换为Data数组,我应该在Data[0]Data[1]中获得初始数据。

#include <iostream>

class Data {
public:
    int     i1[2];
    double  d1[3];
    int     i2[3];
};


int main()
{
    //Setting some data for the example:
    int     data_i1[2] = {  1,   100};          //2 * 4 =  8 bytes
    double  data_d1[3] = {0.1, 100.2, 200.3 };  //3 * 8 = 24 bytes
    int     data_i2[3] = {  2,   200, 305   };  //3 * 4 = 12 bytes
                                                //total = 44 bytes

    //As arrays the data is 44 bytes, but size of Data is 48 bytes:
    printf("sizeof(data_i1) = %d\n",    sizeof(data_i1));
    printf("sizeof(data_d1) = %d\n",    sizeof(data_d1));
    printf("sizeof(data_i2) = %d\n",    sizeof(data_i1));
    printf("total size      = %d\n\n",  sizeof(data_i1) + sizeof(data_d1) + sizeof(data_i2));
    printf("sizeof(Data)    = %d\n",    sizeof(Data));


    //This can hold the above that of 44 bytes, twice:
    char c_data[88];

    //Copying the data from the arrays to a char array
    //In reality the data is read from a binary file to the char array
    memcpy(c_data +  0, data_i1,  8);
    memcpy(c_data +  8, data_d1, 24);
    memcpy(c_data + 32, data_i2, 12); //c_data contains data_i1, data_d1, data_i2
    memcpy(c_data + 44,  c_data, 44); //c_data contains data_i1, data_d1, data_i2 repeated twice

    //Casting the char array to a Data array:
    Data* data = (Data*)c_data;

    //The first Data object in the Data array gets the correct values:
    Data data1 = data[0];
    //The second Data object gets bad data:
    Data data2 = data[1];

    printf("data1 : [%4d, %4d] [%4.1f, %4.1f, %4.1f] [%4d, %4d, %4d]\n", data1.i1[0], data1.i1[1], data1.d1[0], data1.d1[1], data1.d1[2], data1.i2[0], data1.i2[1], data1.i2[2]);
    printf("data2 : [%4d, %4d] [%4.1f, %4.1f, %4.1f] [%4d, %4d, %4d]\n", data2.i1[0], data2.i1[1], data2.d1[0], data2.d1[1], data2.d1[2], data2.i2[0], data2.i2[1], data2.i2[2]);

    return 0;
}

The code output is:代码 output 是:

sizeof(data_i1) = 8
sizeof(data_d1) = 24
sizeof(data_i2) = 8
total size      = 44

sizeof(Data)    = 48
data1 : [   1,  100] [ 0.1, 100.2, 200.3] [   2,  200,  305]
data2 : [ 100, -1717986918] [-92559653364574087271962722384372548731666605007261414794985472.0, -0.0,  0.0] [-390597128,  100, -858993460]

How should I correctly handle this?我应该如何正确处理这个? Can I somehow disable this padding/alignment (if that is the right term)?我可以以某种方式禁用此填充/对齐(如果这是正确的术语)? Is it possible to create a member function to the class to specify how the casting is done?是否可以在 class 中创建一个成员 function 来指定如何进行转换?

Before C++20, you are not allowed to just cast a pointer to a different type and use it if you haven't actually created an object of the destination type.在 C++20 之前,如果您尚未实际创建目标类型的 object,则不允许将指针强制转换为不同类型并使用它。

Since C++20 this is allowed in your specific case because objects will be created implicitly in char arrays when they start their lifetime and the object has implicit-lifetime type , which your Data happens to have.由于 C++20 在您的特定情况下这是允许的,因为对象将在char arrays 开始其生命周期时隐式创建,并且 object 具有隐式生命周期类型,而您的Data恰好具有该类型。

But even in C++20, you have no guarantee that there won't be any padding between members of the struct and therefore it is not safe to just cast the pointer or memcpy the whole struct.但即使在 C++20 中,您也无法保证结构成员之间不会有任何填充,因此仅强制转换指针或memcpy整个结构是不安全的。 Even if you verify that there is no padding issue, you need to additionally provide correct alignment to the storage array with alignas :即使您验证没有填充问题,您还需要使用 alignas 向存储阵列另外提供正确的alignas

alignas(alignof(Data)) char c_data[sizeof(Data)*2];

and probably you will also need to call std::launder on the pointer to make it point to the implicitly-created Data object:并且可能您还需要在指针上调用std::launder以使其指向隐式创建的Data object:

Data* data = std::launder(reinterpret_cast<Data*>(c_data));

Instead of doing all of that, create an object of type Data (or array thereof) directly (this also resolves the alignment issue) and memcpy the individual members one-by-one to avoid padding issues:与其做所有这些,不如直接创建一个Data类型(或其数组)的 object(这也解决了 alignment 问题)并逐个memcpy各个成员以避免填充问题:

Data data[2];

// Loop through array and `memcpy` each member individually

Also, do not use explicit number constants for sizes and offsets.此外,不要对大小和偏移量使用显式数字常量。 Always use sizeof on the correct types to make sure that you don't accidentally cause a mismatch, which you already have in your code, causing access to the storage array out-of-bounds.始终在正确的类型上使用sizeof以确保不会意外导致代码中已经存在的不匹配,从而导致对存储阵列的访问越界。


As a non-portable alternative, compilers usually offer attributes to force class members to be packed without leaving any padding room, see this question .作为一种不可移植的替代方案,编译器通常提供属性来强制 class 成员在不留任何填充空间的情况下打包,请参阅此问题 However, this may come with significant performance loss because CPUs usually assume certain alignment of certain types and if data isn't aligned like that the operations will either take longer or may not be allowed at all depending on the architecture.但是,这可能会带来显着的性能损失,因为 CPU 通常假定某些类型的 alignment 并且如果数据没有像这样对齐,则操作将花费更长的时间,或者可能根本不允许,具体取决于架构。

Also, even if you pack your Data struct, the points I made above about the casting still apply, however it might allow you to just declare此外,即使您打包Data结构,我上面关于强制转换的观点仍然适用,但是它可能允许您只声明

Data data[2];

from the start and directly read from the file into this data .从头开始并直接从文件中读取到这个data (The cast reinterpret_cast<char*>(data) and writing through that pointer is allowed if Data is trivially-copyable, which it is here, and assuming that the data you read actually has the proper layout for Data .) (如果Data是可简单复制的,则允许强制转换reinterpret_cast<char*>(data)并通过该指针进行写入,并且假设您读取的数据实际上具有Data的正确布局。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM