简体   繁体   English

使用reinterpret_cast是否会引起未定义的行为?

[英]Does this use of reinterpret_cast invoke an undefined behavior?

The basic idea is to create a variable size array, fixed at construction time and another class in a single allocation unit in order to reduce overhead and improve efficiency. 基本思想是创建一个可变大小的数组,该数组在构造时固定,并且在单个分配单元中固定另一个类,以减少开销并提高效率。 A buffer is allocated to fit the array and another object and placement new is used to construct them. 分配了一个缓冲区以适合该数组,并使用另一个对象和placement new构造它们。 In order to access the elements of the array and the other object a pointer arithmetic and reinterpret_cast are used. 为了访问数组的元素和另一个对象,使用了指针算法和reinterpret_cast。 That seems to work (at least in gcc), but my reading of the standard (5.2.10 Reinterpret Cast) tells me it's an undefined behavior. 这似乎可行(至少在gcc中有效),但是我对标准的阅读(5.2.10重新解释Cast)告诉我这是未定义的行为。 Is that correct? 那是对的吗? And if so, is there any way to implement this design without UB? 如果是这样,没有UB可以实现这种设计吗?

Full compilable example is here: http://ideone.com/C9CCa8 完整的可编译示例在这里: http : //ideone.com/C9CCa8

// a buffer contains array of A followed by B, laid out like this
// | A[N - 1] ... A[0] | B |

class A
{
    size_t index;
//...
// using reinterpret_cast to get to B object
    const B* getB() const 
    { 
        return reinterpret_cast<const B*>(this + index + 1); 
    }
};

class B
{
    size_t a_count;
//...
    virtual ~B() {}
// using reinterpret_cast to get to the array member
    const A* getA(size_t i) const 
    { 
        return reinterpret_cast<const A*>(this) - i - 1; 
    }
};

// using placement new to construct all objects in raw memory
B* make_record(size_t a_count)
{
    char* buf = new char[a_count*sizeof(A) + sizeof(B)];
    for(auto i = 0; i < a_count; ++i)
    {
        new(buf) A(a_count - i - 1);
        buf += sizeof(A);
    }
    return new(buf) B(a_count);
}

When using placement new, it's up to you to ensure the target memory is properly aligned for your data type, otherwise it is undefined behavior. 使用新放置时,要确保目标内存已针对您的数据类型正确对齐,否则是未定义的行为。 After an array of A's, it is not guaranteed that the alignment of buf will be correct for an object of type B. Your use of reinterpret_cast is also undefined behavior. 在数组A后面,不能保证buf的对齐对于类型B的对象是正确的。您对reinterpret_cast的使用也是未定义的行为。

Undefined behavior doesn't mean it won't work. 未定义的行为并不意味着它将无法工作。 It may for a particular compiler, and a particular set of class types and pointer offsets, etc. But you cannot put this code in an arbitrary standard-conformant compiler and guarantee it will work. 它可能适用于特定的编译器,以及特定的类类型和指针偏移量集等。但是您不能将此代码放入任何符合标准的编译器中,并保证它会工作。

Use of these hacks strongly suggests you have not designed your solution properly. 使用这些hack强烈表明您没有正确设计解决方案。

It's an interesting question. 这是一个有趣的问题。 The question is what does this + index + 1 point to. 问题是this + index + 1指向什么。 If it really is a B , there should be no problem (assuming that an A* is sufficiently large to contain a B* without loss of value): "Converting a prvalue of type 'pointer to T1' to the type 'pointer to T2' (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value." 如果它确实是一个B ,那么应该没有问题(假设A*足够大,可以包含B*而不会丢失值):“将类型'pointer to T1'的prvalue转换为'pointer to T2'的类型“(其中T1和T2是对象类型,并且T2的对齐要求不严格于T1的对齐要求,然后返回其原始类型将产生原始指针值。” (§5.2.10/7) Since you've used the same expression (basically) to obtain the address at which you construct the B , the only thing you can legally do with this + index + 1 is to convert it back to a B* . (第5.2.10 / 7节)由于使用了相同的表达式(基本上)来获取构造B的地址,因此合法使用this + index + 1唯一要做的就是将其转换回a B*

But since you need the index variable in each element anyway, why not save it as a pointer, rather than an index. 但是,由于仍然需要在每个元素中使用index变量,因此为什么不将其另存为指针而不是索引。

And in the end: this is a horrible solution with regards to code readability, and robustness. 最后:关于代码的可读性和健壮性,这是一个可怕的解决方案。 In particular, if B has stricter alignment requirements than A , you can easily end up with the B misaligned. 特别是,如果B对齐要求比A严格,则很容易导致B对齐失败。 And if you change anything down the road, B might end up with stricter alignment requirements. 而且,如果您日后进行任何更改, B可能最终会面临更严格的对齐要求。 I'd avoid this solution at all costs. 我会不惜一切代价避免这种解决方案。

The sample code you posted does not show problems, because it just happens to have the same alignment requirements for both classes (and uses nice even numbers of objects of class A). 您发布的示例代码没有显示问题,因为这恰好对两个类都具有相同的对齐要求(并且使用的A类对象的偶数很好)。 I modified your example somewhat to demonstrate what happens if alignof(A) < align of(B) and you use odd numbers of A: http://ideone.com/eC7l17 我对您的示例进行了一些修改,以演示如果alignof(A)<align of(B)并且您使用A的奇数时会发生什么: http : //ideone.com/eC7l17

Now you get this output: 现在您得到以下输出:

B starts at 0x9003008, needs alignment 4, misaligned by 0
B has 0 As
B starts at 0x900306a, needs alignment 4, misaligned by 2
B has 1 As
A[]
B starts at 0x90030cc, needs alignment 4, misaligned by 0
B has 2 As
A[]
A[]

and interesting things would happen if you tried to use the misaligned pointer to B (recovered from A[0]. 如果尝试使用指向B的未对齐指针(从A [0]中恢复),则会发生有趣的事情。

Avi Berger already suggested a fix. Avi Berger已经提出了解决方案。 I'll try to come up with a generalized template for arbitrary A and B that will do the right thing. 我将尝试为任意A和B提出一个通用模板,以完成正确的工作。

| A[N - 1] ... A[0] | <padding> | B |

where the padding is computed based on alignof(A) and alignof(B) 根据alignof(A)和alignof(B)计算填充的位置

The problem seems to happen when you have one child object dependent of multiple parents. 当您有一个子对象依赖于多个父对象时,似乎会发生此问题。 In your case, using raw pointers such as 在您的情况下,请使用原始指针,例如

const B* A::getB() const 
{ 
  return (B*)(this + index + 1); 
}

or 要么

const B* A::getB() const 
{ 
  return (B*)((void*)this + sizeof(A) * (index + 1)); 
}

should yield exactly the same pointer arithmetic you want to achieve. 应该产生与您要实现的指针算法完全相同的指针。 What I understood from this doc is (example taken from there): 我从此文档中了解的是(从那里获取的示例):

class Base1 {public: virtual ~Base1() {}};
class Base2 {public: virtual ~Base2() {}};
class Derived: public Base1, public Base2 {public: virtual ~Derived() {}};

// ...
Derived obj;
Derived* dp = &obj;
Base1* b1p = dp;
Base2* b2p = dp; // [1]
Derived* dps = static_cast<Derived*>(b2p); // [2]
Derived* dpr = reinterpret_cast<Derived*>(b2p); // [3]

dp is a pointer to the object Derived , which layout is basically something like a concatenation of Base1 , Base2 and Derived in that order: dp是指向Derived对象的指针,该布局基本上类似于按以下顺序串联Base1Base2Derived

---- address 1: used by Derived and Base1
---- members of Base1: roughly sizeof(Base1))
---- address 2: used by Base2
---- members of Base2: roughly sizeof(Base2))
---- members of Derived 

(though I really think this is completely implementation specific, but it is my understanding of the layout). (尽管我真的认为这完全是实现特定的,但这是我对布局的理解)。

If you would like to point to the parent Base2 object within the Derived object, the equal operator (line [1] ) casts correctly to the parent Base2 address. 如果要指向“ Derived对象中的父Base2对象,则等于运算符(第[1] )正确地转换为父Base2地址。 The static_cast operator (line [2] ) gets back to the original value using the the hierarchy known at compilation time. static_cast运算符(第[2] )使用编译时已知的层次结构返回到原始值。 The reinterpret_cast on the oder hand is like a C style cast, and since it operates on a pointer to Base2 , returns an erroneous pointer to a Derived object in dpr . oder手上的reinterpret_cast就像C样式dpr一样,由于它对指向Base2的指针进行操作,因此将错误的指针返回到dprDerived对象。

Coming back to your initial question, I do not think you may have any issue as long as their are no dependencies between your two classes in terms of hierarchy. 回到您的第一个问题,我认为您可能没有任何问题,只要它们在层次结构方面不是您的两个类之间的依赖项即可。 Using casts such as void * and explicit pointer arithmetic ( sizeof(A) ) seems however to me more appropriate. 在我看来,使用诸如void *和显式指针算术( sizeof(A) )之类的强制转换似乎更合适。

I am curious to know in what extent it will improve the performances in fact against having the array of A s and a pointer to the unique B . 我很好奇,实际上知道将A数组和指向唯一B的指针相比较,它将在多大程度上提高性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM