简体   繁体   English

做std :: vector <Simd_wrapper> 内存中有连续数据?

[英]Does std::vector<Simd_wrapper> have contiguous data in memory?

class Wrapper {
public:
    // some functions operating on the value_
    __m128i value_;
};

int main() {
    std::vector<Wrapper> a;
    a.resize(100);
}

Would the value_ attribute of the Wrapper objects in the vector a always occupy contiguous memory without any gaps between the __m128i values ? vector a Wrapper对象的value_属性是否总是占据连续的内存,而__m128i values之间没有任何间隙?

I mean: 我的意思是:

[128 bit for 1st Wrapper][no gap here][128bit for 2nd Wrapper] ...

So far, this seems to be true for g++ and the Intel cpu I am using, and gcc godbolt. 到目前为止,这对于g ++和我正在使用的Intel cpu以及gcc godbolt似乎都是正确的。

Since there is only a single __m128i attribute in the Wrapper object, does that mean the compiler always do not need to add any kind of padding in memory? 由于Wrapper对象中只有一个__m128i属性,这是否意味着编译器始终不需要在内存中添加任何填充? ( Memory layout of vector of POD objects ) POD对象的向量的内存布局

Test code 1: 测试代码1:

#include <iostream>
#include <vector>
#include <x86intrin.h>

int main()
{
  static constexpr size_t N = 1000;
  std::vector<__m128i> a;
  a.resize(1000);
  //__m128i a[1000];
  uint32_t* ptr_a = reinterpret_cast<uint32_t*>(a.data());
  for (size_t i = 0; i < 4*N; ++i)
    ptr_a[i] = i;
  for (size_t i = 1; i < N; ++i){
    a[i-1] = _mm_and_si128 (a[i], a[i-1]);
  }
  for (size_t i = 0; i < 4*N; ++i)
    std::cout << ptr_a[i];
}

Warning: 警告:

warning: ignoring attributes on template argument 
'__m128i {aka __vector(2) long long int}'
[-Wignored-attributes]

Assembly ( gcc god bolt ): 组装( gcc神螺栓 ):

.L9:
        add     rax, 16
        movdqa  xmm1, XMMWORD PTR [rax]
        pand    xmm0, xmm1
        movaps  XMMWORD PTR [rax-16], xmm0
        cmp     rax, rdx
        movdqa  xmm0, xmm1
        jne     .L9

I guess this means the data is contiguous because the loop just add 16 bytes to the memory address it reads in every cycle of the loop. 我猜这意味着数据是连续的,因为循环仅将16个字节添加到它在循环的每个循环中读取的内存地址。 It is using pand to do the bitwise and. 它使用pand按位与。

Test code 2: 测试代码2:

#include <iostream>
#include <vector>
#include <x86intrin.h>
class Wrapper {
public:
    __m128i value_;
    inline Wrapper& operator &= (const Wrapper& rhs)
    {
        value_ = _mm_and_si128(value_, rhs.value_);
    }
}; // Wrapper
int main()
{
  static constexpr size_t N = 1000;
  std::vector<Wrapper> a;
  a.resize(N);
  //__m128i a[1000];
  uint32_t* ptr_a = reinterpret_cast<uint32_t*>(a.data());
  for (size_t i = 0; i < 4*N; ++i) ptr_a[i] = i;
  for (size_t i = 1; i < N; ++i){
    a[i-1] &=a[i];
    //std::cout << ptr_a[i];
  }
  for (size_t i = 0; i < 4*N; ++i)
    std::cout << ptr_a[i];
}

Assembly ( gcc god bolt ) 组装( gcc神螺栓

.L9:
        add     rdx, 2
        add     rax, 32
        movdqa  xmm1, XMMWORD PTR [rax-16]
        pand    xmm0, xmm1
        movaps  XMMWORD PTR [rax-32], xmm0
        movdqa  xmm0, XMMWORD PTR [rax]
        pand    xmm1, xmm0
        movaps  XMMWORD PTR [rax-16], xmm1
        cmp     rdx, 999
        jne     .L9

Looks like no padding too. 看起来也没有填充。 rax increases by 32 in each step, and that is 2 x 16. That extra add rdx,2 is definitely not as good as the loop from test code 1. rax在每一步中增加32,即2 add rdx,2 。额外的add rdx,2绝对不如测试代码1中的循环好。

Test auto-vectorization 测试自动向量化

#include <iostream>
#include <vector>
#include <x86intrin.h>

int main()
{
  static constexpr size_t N = 1000;
  std::vector<__m128i> a;
  a.resize(1000);
  //__m128i a[1000];
  uint32_t* ptr_a = reinterpret_cast<uint32_t*>(a.data());
  for (size_t i = 0; i < 4*N; ++i)
    ptr_a[i] = i;
  for (size_t i = 1; i < N; ++i){
    a[i-1] = _mm_and_si128 (a[i], a[i-1]);
  }
  for (size_t i = 0; i < 4*N; ++i)
    std::cout << ptr_a[i];
}

Assembly ( god bolt ): 组装( 神螺栓 ):

.L21:
        movdqu  xmm0, XMMWORD PTR [r10+rax]
        add     rdi, 1
        pand    xmm0, XMMWORD PTR [r8+rax]
        movaps  XMMWORD PTR [r8+rax], xmm0
        add     rax, 16
        cmp     rsi, rdi
        ja      .L21

... I just don't know if this is always true for intel cpu and g++/intel c++ compilers/(insert compiler name here) ... ...我只是不知道对于intel cpu和g ++ / intel c ++编译器/(在这里插入编译器名称)是否总是如此...

There is no guarantee that there won't be padding at the end of the class Wrapper only that there won't be padding at its beginning . 我们不能保证在class Wrapper的结尾不会填充内容,只是保证在它的开头不会填充内容。

According to the C++11 Standard: 根据C++11标准:

9.2 Class members [ class.mem ] 9.2班级成员[ class.mem ]

20 A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. 20指向标准布局结构对象的指针(使用reinterpret_cast进行了适当转换)指向其初始成员(或者,如果该成员是位域,则指向其驻留的单元),反之亦然。 [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. [注意:因此,在标准布局结构对象中可能会存在未命名的填充,但在其开始时可能没有,这是实现适当对齐所必需的。 — end note ] —尾注]

Also under sizeof : 也在sizeof下:

5.3.3 Sizeof [ expr.sizeof ] 5.3.3 Sizeof [ expr.sizeof ]

2 When applied to a reference or a reference type, the result is the size of the referenced type. 2当应用于引用或引用类型时,结果是引用类型的大小。 When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. 当应用于类时,结果是该类的对象中的字节数,包括将该类型的对象放置在数组中所需的任何填充。

It isn't guaranteed. 不能保证。 Galik's answer quotes the standard, so I'll focus on some of the risks of assuming that it will be contiguous. Galik的答案引用了该标准,因此,我将重点介绍假定该标准是连续的一些风险。

I wrote this small program and compiled with gcc, and it did put the integers contiguously: 我编写了这个小程序,并使用gcc进行了编译,它确实将整数连续放置:

#include <iostream>
#include <vector>

class A
{
public:
  int a;
  int method() { return 1;}
  float method2() { return 5.5; }
};

int main()
{
  std::vector<A> as;
  for(int i = 0; i < 10; i++)
  {
     as.push_back(A()); 
  }
  for(int i = 0; i < 10; i++)
  {
     std::cout << &as[i] << std::endl; 
  }
}

However with one small change, the gaps started appearing: 但是,有了一个小的更改,差距就开始出现:

#include <iostream>
#include <vector>

class A
{
public:
  int a;
  int method() { return 1;}
  float method2() { return 5.5; }
  virtual double method3() { return 0.1; } //this is the only change
};

int main()
{
  std::vector<A> as;
  for(int i = 0; i < 10; i++)
  {
     as.push_back(A()); 
  }
  for(int i = 0; i < 10; i++)
  {
     std::cout << &as[i] << std::endl; 
  }
}

Objects with virtual methods (or that inherit from objects with virtual methods) need to store a little extra information to know where to find the appropriate method, because it doesn't know which between the base class or any of the overrides until runtime. 具有虚拟方法的对象(或从具有虚拟方法的对象继承的对象)需要存储一些额外的信息,以知道在哪里可以找到合适的方法,因为直到运行时它才知道基类或任何替代之间的哪个。 This is why it is advised to never use memset on a class . 这就是为什么建议不要在类上使用memset的原因 As other answers point out, there may be padding there too, which isn't guaranteed to be consistent across compilers or even different versions of the same compiler. 正如其他答案所指出的那样,那里也可能存在填充,这不能保证跨编译器甚至同一编译器的不同版本都是一致的。

In the end, it probably is just not worth it to assume that it will be continuous on a given compiler, and even if you test it and it works, simple things like adding a virtual method later will cause you a massive headache. 最后,假设它在给定的编译器上是连续的,可能就不值得了,即使您对其进行了测试并且可以正常工作,诸如稍后添加虚拟方法之类的简单事情也会使您头痛不已。

No-padding is safe to assume in practice, unless you're compiling for a non-standard ABI. 在实践中可以假设没有填充,除非您要编写非标准的ABI。

All compilers targeting the same ABI must make the same choice about struct/class sizes / layouts, and all the standard ABIs / calling conventions will have no padding in your struct. 所有针对同一ABI的编译器都必须对结构/类的大小/布局进行相同的选择,并且所有标准ABI /调用约定在您的结构中都不会填充。 (ie x86-32 and x86-64 System V and Windows, see the tag wiki for links). (即x86-32和x86-64 System V和Windows,请参阅标签wiki以获得链接)。 Your experiments with one compiler confirm it for all compilers targeting the same platform/ABI. 您使用一个编译器进行的实验对所有针对同一平台/ ABI的编译器都可以确认。

Note that the scope of this question is limited to x86 compilers that support Intel's intrinsics and the __m128i type, which means we have much stronger guarantees than what you get from just the ISO C++ standard without any implementation-specific stuff. 请注意,此问题的范围仅限于支持Intel内在函数和__m128i类型的x86编译器,这意味着我们比没有任何实现特定内容的ISO C ++标准所提供的保证要强得多。


As @zneak points out, you can static_assert(std::is_standard_layout<Wrapper>::value) in the class def to remind people not to add any virtual methods, which would add a vtable pointer to each instance. 正如@zneak所指出的,您可以在类def中使用static_assert(std::is_standard_layout<Wrapper>::value)来提醒人们不要添加任何虚拟方法,这将为每个实例添加一个vtable指针。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM