简体   繁体   English

将字节数组转换为POD

[英]Cast array of bytes to POD

Let's say, I have an array of unsigned chars that represents a bunch of POD objects (eg either read from a socket or via mmap). 比方说,我有一组无符号字符代表一堆POD对象(例如从套接字或通过mmap读取)。 Which types they represent and at what position is determined at runtime, but we assume, that each is already properly aligned. 它们代表哪些类型以及在运行时确定的位置,但我们假设每个类型已经正确对齐。

What is the best way to "cast" those bytes into the respective POD type? 将这些字节“转换”为相应的POD类型的最佳方法是什么?

A solution should either be compliant to the c++ standard (let's say >= c++11) or at least be guaranteed to work with g++ >= 4.9, clang++ >= 3.5 and MSVC >= 2015U3. 解决方案应该符合c ++标准(比方说> = c ++ 11)或者至少可以保证使用g ++> = 4.9,clang ++> = 3.5和MSVC> = 2015U3。 EDIT: On linux, windows, running on x86/x64 or 32/64-Bit arm. 编辑:在Linux,Windows上运行x86 / x64或32/64位臂。

Ideally I'd like to do something like this: 理想情况下,我想做这样的事情:

uint8_t buffer[100]; //filled e.g. from network

switch(buffer[0]) {
    case 0: process(*reinterpret_cast<Pod1*>(&buffer[4]); break;
    case 1: process(*reinterpret_cast<Pod2*>(&buffer[8+buffer[1]*4]); break;
    //...
}

or 要么

switch(buffer[0]) {
    case 0: {
         auto* ptr = new(&buffer[4]) Pod1; 
         process(*ptr); 
    }break;
    case 1: {
         auto* ptr = new(&buffer[8+buffer[1]*4]) Pod2; 
         process(*ptr); 
    }break;
    //...
}

Both seem to work, but both are AFAIK undefined behavior in c++ 1) . 两者似乎都有效,但两者都是c ++中的AFAIK未定义行为1) And just for completeness: I'm aware of the "usual" solution to just copy the stuff into an appropriate local variable: 而且只是为了完整性:我知道将通常的东西复制到适当的局部变量中的“通常”解决方案:

 Pod1 tmp;
 std::copy_n(&buffer[4],sizeof(tmp), reinterpret_cast<uint8_t*>(&tmp));             
 process(tmp); 

In some situations it might be no overhead in others it is and in some situations it might even be faster but performance aside, I no longer can eg modify the data in place and to be honest: it just annoys me to know that I have the right bits at an appropriate location in memory but I just can't use them. 在某些情况下,它可能不是其他人的开销,在某些情况下甚至可能更快,但性能除外,我不再能够例如修改数据并且说实话:它让我很生气,知道我有右位在内存中的适当位置,但我不能使用它们。


A somewhat crazy solution I came up with is this: 我想出的一个有点疯狂的解决方案是:

template<class T>
T* inplace_cast(uint8_t* data) {
    //checks omitted for brevity
    T tmp;
    std::memmove((uint8_t*)&tmp, data, sizeof(tmp));
    auto ptr = new(data) T;
    std::memmove(ptr, (uint8_t*)&tmp,  sizeof(tmp));
    return ptr;

}

g++ and clang++ seem to be able to optimize away those copies but I think this puts a lot of burden on the optimizer and might cause other optimizations to fail, doesn't work with const uint8_t* (although I don't want to actually modify it) and just looks horrible (don't think you would get that past code review). g ++和clang ++似乎能够优化掉那些副本,但我认为这会给优化器带来很多负担,并可能导致其他优化失败,不能用于const uint8_t* (尽管我不想实际修改它只是看起来很可怕(不要以为你会得到过去的代码审查)。


1) The first one is UB because it breaks strict aliasing, the second one is probably UB ( discussed here ) because the standard just says that the resulting object is not initialized and has indeterminate value (instead of guaranteeing that the underlying memory is untouched). 1)第一个是UB,因为它打破了严格的别名,第二个可能是UB( 这里讨论过 ),因为标准只是说生成的对象没有初始化并且具有不确定的值(而不是保证底层内存不受影响) 。 I believe the first one's equivalent c-code is well defined, so compilers might allow this for compatibility with c-headers, but I'm unsure of this. 我相信第一个等效的c代码是明确定义的,因此编译器可能允许这与c-header的兼容性,但我不确定这一点。

The most correct way is to create a (temporary) variable of the desired POD class, and to use memcpy() to copy data from the buffer into that variable: 最正确的方法是创建所需POD类的(临时)变量,并使用memcpy()将数据从缓冲区复制到该变量中:

switch(buffer[0]) {
    case 0: {
        Pod1 var;
        std::memcpy(&var, &buffer[4], sizeof var);
        process(var);
        break;
    }
    case 1: {
        Pod2 var;
        std::memcpy(&var, &buffer[8 + buffer[1] * 4], sizeof var);
        process(var);
        break;
    }
    //...
}

There main reason for doing this is because of alignment issues: the data in the buffer may not be aligned correctly for the POD type you are using. 执行此操作的主要原因是对齐问题:缓冲区中的数据可能无法正确对齐您正在使用的POD类型。 Making a copy eliminates this problem. 制作副本可以消除这个问题。 It also allows you to keep using the variable even if the network buffer is no longer available. 即使网络缓冲区不再可用,它也允许您继续使用变量。

Only if you are absolutely sure that the data is properly aligned can you use the first solution you gave. 只有当您完全确定数据已正确对齐时,才能使用您提供的第一个解决方案。

(If you are reading in data from the network, you should always check that the data is valid first, and that you won't read outside of your buffer. For example with &buffer[8 + buffer[1] * 4] , you should check that the start of that address plus the size of Pod2 does not exceed the buffer length. Luckily you are using uint8_t , otherwise you'd also have to check that buffer[1] is not negative.) (如果您正在从网络读取数据,则应始终首先检查数据是否有效,并且不会在缓冲区外读取。例如,使用&buffer[8 + buffer[1] * 4] ,应检查该地址的开头加上Pod2的大小是否超过缓冲区长度。幸运的是你正在使用uint8_t ,否则你还必须检查buffer[1]是否为负数。)

Using a union allows to escape anti-aliasing rule. 使用union可以逃避反锯齿规则。 In fact that is what unions are for. 事实上,这就是工会的意义所在。 So casting pointers to a union type from a type that is part of the union is explicitly allowed in C++ standard (Clause 3.10.10.6). 因此,在C ++标准(条款3.10.10.6)中明确允许从属于联合的类型的联合类型转换指针。 Same thing is allowed in C standard (6.5.7). C标准(6.5.7)允许相同的内容。

Therefore depending on the other properties a conforming equivalent of your sample can be as follows. 因此,根据其他属性,样品的符合等效值可以如下所示。

union to_pod {
    uint8_t buffer[100];
    Pod1 pod1;
    Pod1 pod2;
    //...
};

uint8_t buffer[100]; //filled e.g. from network

switch(buffer[0]) {
    case 0: process(reinterpret_cast<to_pod*>(buffer + 4)->pod1); break;
    case 1: process(reinterpret_cast<to_pod*>(buffer + 8 + buffer[1]*4)->pod2); break;
    //...
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM