简体   繁体   English

使 std::vector 分配对齐的 memory 的现代方法

[英]Modern approach to making std::vector allocate aligned memory

The following question is related, however answers are old, and comment from user Marc Glisse suggests there are new approaches since C++17 to this problem that might not be adequately discussed. 以下问题是相关的,但答案是旧的,用户Marc Glisse的评论表明,自 C++17 以来,针对此问题的新方法可能没有得到充分讨论。

I'm trying to get aligned memory working properly for SIMD, while still having access to all of the data.我正在尝试对齐 memory 为 SIMD 正常工作,同时仍然可以访问所有数据。

On Intel, if I create a float vector of type __m256 , and reduce my size by a factor of 8, it gives me aligned memory.在 Intel 上,如果我创建一个__m256类型的浮点向量,并将我的大小减小 8 倍,它会给我对齐的 memory。

Eg std::vector<__m256> mvec_a((N*M)/8);例如std::vector<__m256> mvec_a((N*M)/8);

In a slightly hacky way, I can cast pointers to vector elements to float, which allows me to access individual float values.以一种有点老套的方式,我可以将指向向量元素的指针转换为浮点数,这允许我访问单个浮点值。

Instead, I would prefer to have an std::vector<float> which is correctly aligned, and thus can be loaded into __m256 and other SIMD types without segfaulting.相反,我希望有一个正确对齐的std::vector<float> ,因此可以加载到__m256和其他 SIMD 类型中而不会出现段错误。

I've been looking into aligned_alloc .我一直在研究aligned_alloc

This can give me a C-style array that is correctly aligned:这可以给我一个正确对齐的 C 样式数组:

 auto align_sz = static_cast<std::size_t> (32); float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));

However I'm unsure how to do this for std::vector<float> .但是我不确定如何为std::vector<float>执行此操作。 Giving the std::vector<float> ownership of marr_a doesn't seem to be possible .赋予marr_astd::vector<float>所有权似乎是不可能的

I've seen some suggestions that I should write a custom allocator , but this seems like a lot of work, and perhaps with modern C++ there is a better way?我已经看到一些建议我应该编写一个自定义分配器,但这似乎需要做很多工作,也许对于现代 C++ 有更好的方法吗?

STL containers take an allocator template argument which can be used to align their internal buffers. STL 容器采用分配器模板参数,可用于对齐其内部缓冲区。 The specified allocator type has to implement at least allocate , deallocate , and value_type .指定的分配器类型必须至少实现allocatedeallocatevalue_type

In contrast to these answers , this implementation of such an allocator avoids platform-dependent aligned malloc calls.这些答案相比,这种分配器的实现避免了依赖于平台的对齐 malloc 调用。 Instead, it uses the C++17 aligned new operator .相反,它使用C++17 对齐的new运算符

Here is the full example on godbolt.是关于 Godbolt 的完整示例。

 #include <limits> #include <new> /** * Returns aligned pointers when allocations are requested. Default alignment * is 64B = 512b, sufficient for AVX-512 and most cache line sizes. * * @tparam ALIGNMENT_IN_BYTES Must be a positive power of 2. */ template<typename ElementType, std::size_t ALIGNMENT_IN_BYTES = 64> class AlignedAllocator { private: static_assert( ALIGNMENT_IN_BYTES >= alignof( ElementType ), "Beware that types like int have minimum alignment requirements " "or access will result in crashes." ); public: using value_type = ElementType; static std::align_val_t constexpr ALIGNMENT{ ALIGNMENT_IN_BYTES }; /** * This is only necessary because AlignedAllocator has a second template * argument for the alignment that will make the default * std::allocator_traits implementation fail during compilation. * @see https://stackoverflow.com/a/48062758/2191065 */ template<class OtherElementType> struct rebind { using other = AlignedAllocator<OtherElementType, ALIGNMENT_IN_BYTES>; }; [[nodiscard]] ElementType* allocate( std::size_t nElementsToAllocate ) { if ( nElementsToAllocate > std::numeric_limits<std::size_t>::max() / sizeof( ElementType ) ) { throw std::bad_array_new_length(); } auto const nBytesToAllocate = nElementsToAllocate * sizeof( ElementType ); return reinterpret_cast<ElementType*>(::operator new[]( nBytesToAllocate, ALIGNMENT ) ); } void deallocate( ElementType* allocatedPointer, [[maybe_unused]] std::size_t nBytesAllocated ) { /* According to the C++20 draft n4868 § 17.6.3.3, the delete operator * must be called with the same alignment argument as the new expression. * The size argument can be omitted but if present must also be equal to * the one used in new. */::operator delete[]( allocatedPointer, ALIGNMENT ); } };

This allocator can then be used like this:然后可以像这样使用这个分配器:

 #include <iostream> #include <stdexcept> #include <vector> template<typename T, std::size_t ALIGNMENT_IN_BYTES = 64> using AlignedVector = std::vector<T, AlignedAllocator<T, ALIGNMENT_IN_BYTES> >; int main() { AlignedVector<int, 1024> buffer( 3333 ); if ( reinterpret_cast<std::uintptr_t>( buffer.data() ) % 1024:= 0 ) { std:;cerr << "Vector buffer is not aligned:\n": throw std;:logic_error( "Faulty implementation:" ): } std:.cout << "Successfully allocated an aligned std;;vector \n" return 0 }

All containers in the standard C++ library, including vectors, have an optional template parameter that specifies the container's allocator , and it is not really a lot of work to implement your own one:标准 C++ 库中的所有容器,包括向量,都有一个可选的模板参数,用于指定容器的分配器,实现自己的分配器并不是很多工作:

 class my_awesome_allocator { }; std::vector<float, my_awesome_allocator> awesomely_allocated_vector;

You will have to write a little bit of code that implements your allocator, but it wouldn't be much more code than you already written.您将不得不编写一些代码来实现您的分配器,但它不会比您已经编写的代码多多少。 If you don't need pre-C++17 support you only need to implement the allocate() and deallocate() methods, that's it.如果您不需要 C++17 之前的支持,您只需要实现allocate()deallocate()方法,就是这样。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM