简体   繁体   English

标准布局和尾部填充

[英]Standard-layout and tail padding

David Hollman recently tweeted the following example (which I've slightly reduced): 大卫霍尔曼最近在推特上发布了以下示例(我稍微减少了):

struct FooBeforeBase {
    double d;
    bool b[4];
};

struct FooBefore : FooBeforeBase {
    float value;
};

static_assert(sizeof(FooBefore) > 16);

//----------------------------------------------------

struct FooAfterBase {
protected:
    double d;
public:  
    bool b[4];
};

struct FooAfter : FooAfterBase {
    float value;
};

static_assert(sizeof(FooAfter) == 16);

You can examine the layout in clang on godbolt and see that the reason the size changed is that in FooBefore , the member value is placed at offset 16 (maintaining a full alignment of 8 from FooBeforeBase ) whereas in FooAfter , the member value is placed at offset 12 (effectively using FooAfterBase 's tail-padding). 您可以检查godbolt上的clang中的布局并查看大小更改的原因是在FooBefore ,成员value放置在偏移量16处(保持与FooBeforeBase的完全对齐),而在FooAfter ,成员value放置在offset 12(有效地使用FooAfterBase的尾部填充)。

It is clear to me that FooBeforeBase is standard-layout, but FooAfterBase is not (because its non-static data members do not all have the same access control, [class.prop]/3 ). 我很清楚, FooBeforeBase是标准布局,但FooAfterBase不是(因为它的非静态数据成员并不都具有相同的访问控制, [class.prop] / 3 )。 But what is it about FooBeforeBase 's being standard-layout that requires this respect of padding bytes? 但是,关于FooBeforeBase的标准布局是什么呢?这需要填充字节的这方面呢?

Both gcc and clang reuse FooAfterBase 's padding, ending up with sizeof(FooAfter) == 16 . gcc和clang都重用了FooAfterBase的填充,最后是sizeof(FooAfter) == 16 But MSVC does not, ending up with 24. Is there a required layout per the standard and, if not, why do gcc and clang do what they do? 但是MSVC没有,结果是24.每个标准是否有必要的布局,如果没有,为什么gcc和clang做他们做的事情?


There is some confusion, so just to clear up: 有一些混乱,所以只是为了清理:

  • FooBeforeBase is standard-layout FooBeforeBase是标准布局
  • FooBefore is not (both it and a base class have non-static data members, similar to E in this example ) FooBefore 不是 (它和基类都有非静态数据成员,在本例中类似于E
  • FooAfterBase is not (it has non-static data members of differing access) FooAfterBase 不是 (它具有不同访问权限的非静态数据成员)
  • FooAfter is not (for both of the above reasons) FooAfter 不是 (出于上述两个原因)

The answer to this question doesn't come from the standard but rather from the Itanium ABI (which is why gcc and clang have one behavior but msvc does something else). 这个问题的答案并非来自标准,而是来自Itanium ABI(这就是为什么gcc和clang有一种行为,但msvc做了别的事情)。 That ABI defines a layout , the relevant parts of which for the purposes of this question are: ABI定义了一个布局 ,为了这个问题,其相关部分是:

For purposes internal to the specification, we also specify: 对于规范内部的目的,我们还指定:

  • dsize (O): the data size of an object, which is the size of O without tail padding. dsize (O):对象的数据大小 ,是没有尾部填充的O的大小。

and

We ignore tail padding for PODs because an early version of the standard did not allow us to use it for anything else and because it sometimes permits faster copying of the type. 我们忽略POD的尾部填充,因为标准的早期版本不允许我们将其用于其他任何东西,因为它有时允许更快地复制该类型。

Where the placement of members other than virtual base classes is defined as: 将虚拟基类以外的成员放置定义为:

Start at offset dsize(C), incremented if necessary for alignment to nvalign(D) for base classes or to align(D) for data members. 从偏移dsize(C)开始,如果需要,则增加以对齐基类的nvalign(D)或对齐数据成员(D)。 Place D at this offset unless [... not relevant ...]. 除非[...不相关...],否则将D放在此偏移处。

The term POD has disappeared from the C++ standard, but it means standard-layout and trivially copyable. 术语POD已从C ++标准中消失,但它意味着标准布局和平凡的可复制。 In this question, FooBeforeBase is a POD. 在这个问题中, FooBeforeBase是一个POD。 The Itanium ABI ignores tail padding - hence dsize(FooBeforeBase) is 16. Itanium ABI忽略尾部填充 - 因此dsize(FooBeforeBase)为16。

But FooAfterBase is not a POD (it is trivially copyable, but it is not standard-layout). 但是FooAfterBase不是POD(它可以轻易复制,但它不是标准布局)。 As a result, tail padding is not ignored, so dsize(FooAfterBase) is just 12, and the float can go right there. 因此,不会忽略尾部填充,因此dsize(FooAfterBase)只有12, float可以直接到那里。

This has interesting consequences, as pointed out by Quuxplusone in a related answer , implementors also typically assume that tail padding isn't reused, which wreaks havoc on this example: 这有一些有趣的结果,正如Quuxplusone在相关答案中所指出的,实现者通常也认为尾部填充不会被重用,这会对这个例子造成严重破坏:

 #include <algorithm> #include <stdio.h> struct A { int m_a; }; struct B : A { int m_b1; char m_b2; }; struct C : B { short m_c; }; int main() { C c1 { 1, 2, 3, 4 }; B& b1 = c1; B b2 { 5, 6, 7 }; printf("before operator=: %d\\n", int(c1.m_c)); // 4 b1 = b2; printf("after operator=: %d\\n", int(c1.m_c)); // 4 printf("before std::copy: %d\\n", int(c1.m_c)); // 4 std::copy(&b2, &b2 + 1, &b1); printf("after std::copy: %d\\n", int(c1.m_c)); // 64, or 0, or anything but 4 } 

Here, = does the right thing (it does not override B 's tail padding), but copy() has a library optimization that reduces to memmove() - which does not care about tail padding because it assumes it does not exist. 这里, =做正确的事情(它不会覆盖B的尾部填充),但是copy()有一个库优化,减少到memmove() - 它不关心尾部填充,因为它假定它不存在。

FooBefore derived;
FooBeforeBase src, &dst=derived;
....
memcpy(&dst, &src, sizeof(dst));

If the additional data member was placed in the hole, memcpy would have overwritten it. 如果附加数据成员放在洞中, memcpy会覆盖它。

As is correctly pointed out in comments, the standard doesn't require that this memcpy invocation should work. 正如在注释中正确指出的那样,该标准不要求此memcpy调用应该起作用。 However the Itanium ABI is seemingly designed with this case in mind. 然而,Itanium ABI似乎是考虑到这种情况而设计的。 Perhaps the ABI rules are specified this way in order to make mixed-language programming a bit more robust, or to preserve some kind of backwards compatibility. 也许ABI规则是以这种方式指定的,以便使混合语言编程更加健壮,或者保持某种向后兼容性。

Relevant ABI rules can be found here . 可以在此处找到相关的ABI规则。

A related answer can be found here (this question might be a duplicate of that one). 可在此处找到相关答案(此问题可能与该问题重复)。

Here is a concrete case which demonstrates why the second case cannot reuse the padding: 这是一个具体的案例,它说明了为什么第二种情况不能重复使用填充:

union bob {
  FooBeforeBase a;
  FooBefore b;
};

bob.b.value = 3.14;
memset( &bob.a, 0, sizeof(bob.a) );

this cannot clear bob.b.value . 这无法清除bob.b.value

union bob2 {
  FooAfterBase a;
  FooAfter b;
};

bob2.b.value = 3.14;
memset( &bob2.a, 0, sizeof(bob2.a) );

this is undefined behavior. 这是未定义的行为。

FooBefore is not std-layout either; FooBefore是std-layout; two classes are declaring none-static data members( FooBefore and FooBeforeBase ). 两个类声明非静态数据成员( FooBeforeFooBeforeBase )。 Thus the compiler is allowed to arbitrarily place some data members. 因此,允许编译器任意放置一些数据成员。 Hence the differences on different tool-chains arise. 因此,出现了不同工具链的差异。 In a std-layout hierarchy, atmost one class(either the most derived class or at most one intermediate class) shall declare none-static data members. 在std-layout层次结构中,最多一个类(最多派生类或最多一个中间类)应声明非静态数据成员。

Here's a similar case as nm's answer. 这是与nm的答案类似的情况。

First, let's have a function, which clears a FooBeforeBase : 首先,让我们有一个函数,它清除一个FooBeforeBase

void clearBase(FooBeforeBase *f) {
    memset(f, 0, sizeof(*f));
}

This is fine, as clearBase gets a pointer to FooBeforeBase , it thinks that as FooBeforeBase has standard-layout, so memsetting it is safe. 这是正常,因为clearBase得到一个指针FooBeforeBase ,它认为,作为FooBeforeBase具有标准的布局,所以memsetting它是安全的。

Now, if you do this: 现在,如果你这样做:

FooBefore b;
b.value = 42;
clearBase(&b);

You don't expect, that clearBase will clear b.value , as b.value is not part of FooBeforeBase . 你没想到, clearBase会清除b.value ,因为b.value不是FooBeforeBase一部分。 But, if FooBefore::value was put into tail-padding of FooBeforeBase , it would been cleared as well. 但是,如果FooBefore::value投入的尾填充FooBeforeBase ,它会被清除,以及。

Is there a required layout per the standard and, if not, why do gcc and clang do what they do? 每个标准是否有必要的布局,如果没有,为什么gcc和clang做他们做的事情?

No, tail-padding is not required. 不,不需要尾部填充。 It is an optimization, which gcc and clang do. 这是一个优化,gcc和clang做的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM