使 std::vector 分配對齊的 memory 的現代方法

Question

以下問題是相關的，但答案是舊的，用戶Marc Glisse的評論表明，自 C++17 以來，針對此問題的新方法可能沒有得到充分討論。

我正在嘗試對齊 memory 為 SIMD 正常工作，同時仍然可以訪問所有數據。

在 Intel 上，如果我創建一個__m256類型的浮點向量，並將我的大小減小 8 倍，它會給我對齊的 memory。

例如std::vector<__m256> mvec_a((N*M)/8);

以一種有點老套的方式，我可以將指向向量元素的指針轉換為浮點數，這允許我訪問單個浮點值。

相反，我希望有一個正確對齊的std::vector<float> ，因此可以加載到__m256和其他 SIMD 類型中而不會出現段錯誤。

我一直在研究aligned_alloc 。

這可以給我一個正確對齊的 C 樣式數組：

 auto align_sz = static_cast<std::size_t> (32); float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));

但是我不確定如何為std::vector<float>執行此操作。 賦予marr_a的std::vector<float>所有權似乎是不可能的。

我已經看到一些建議我應該編寫一個自定義分配器，但這似乎需要做很多工作，也許對於現代 C++ 有更好的方法嗎？

Answer 1

STL 容器采用分配器模板參數，可用於對齊其內部緩沖區。 指定的分配器類型必須至少實現allocate 、 deallocate和value_type 。

與這些答案相比，這種分配器的實現避免了依賴於平台的對齊 malloc 調用。 相反，它使用C++17 對齊的new運算符。

這是關於 Godbolt 的完整示例。

 #include <limits> #include <new> /** * Returns aligned pointers when allocations are requested. Default alignment * is 64B = 512b, sufficient for AVX-512 and most cache line sizes. * * @tparam ALIGNMENT_IN_BYTES Must be a positive power of 2. */ template<typename ElementType, std::size_t ALIGNMENT_IN_BYTES = 64> class AlignedAllocator { private: static_assert( ALIGNMENT_IN_BYTES >= alignof( ElementType ), "Beware that types like int have minimum alignment requirements " "or access will result in crashes." ); public: using value_type = ElementType; static std::align_val_t constexpr ALIGNMENT{ ALIGNMENT_IN_BYTES }; /** * This is only necessary because AlignedAllocator has a second template * argument for the alignment that will make the default * std::allocator_traits implementation fail during compilation. * @see https://stackoverflow.com/a/48062758/2191065 */ template<class OtherElementType> struct rebind { using other = AlignedAllocator<OtherElementType, ALIGNMENT_IN_BYTES>; }; [[nodiscard]] ElementType* allocate( std::size_t nElementsToAllocate ) { if ( nElementsToAllocate > std::numeric_limits<std::size_t>::max() / sizeof( ElementType ) ) { throw std::bad_array_new_length(); } auto const nBytesToAllocate = nElementsToAllocate * sizeof( ElementType ); return reinterpret_cast<ElementType*>(::operator new[]( nBytesToAllocate, ALIGNMENT ) ); } void deallocate( ElementType* allocatedPointer, [[maybe_unused]] std::size_t nBytesAllocated ) { /* According to the C++20 draft n4868 § 17.6.3.3, the delete operator * must be called with the same alignment argument as the new expression. * The size argument can be omitted but if present must also be equal to * the one used in new. */::operator delete[]( allocatedPointer, ALIGNMENT ); } };

然后可以像這樣使用這個分配器：

 #include <iostream> #include <stdexcept> #include <vector> template<typename T, std::size_t ALIGNMENT_IN_BYTES = 64> using AlignedVector = std::vector<T, AlignedAllocator<T, ALIGNMENT_IN_BYTES> >; int main() { AlignedVector<int, 1024> buffer( 3333 ); if ( reinterpret_cast<std::uintptr_t>( buffer.data() ) % 1024:= 0 ) { std:;cerr << "Vector buffer is not aligned:\n": throw std;:logic_error( "Faulty implementation:" ): } std:.cout << "Successfully allocated an aligned std;;vector \n" return 0 }

Answer 2

標准 C++ 庫中的所有容器，包括向量，都有一個可選的模板參數，用於指定容器的分配器，實現自己的分配器並不是很多工作：

 class my_awesome_allocator { }; std::vector<float, my_awesome_allocator> awesomely_allocated_vector;

您將不得不編寫一些代碼來實現您的分配器，但它不會比您已經編寫的代碼多多少。 如果您不需要 C++17 之前的支持，您只需要實現allocate()和deallocate()方法，就是這樣。

使 std::vector 分配對齊的 memory 的現代方法

問題描述

2 個解決方案

解決方案1
3 2022-02-05 00:11:42

解決方案2
1 2020-02-11 13:28:36

使 std::vector 分配對齊的 memory 的現代方法

問題描述

2 個解決方案

解決方案1 3 2022-02-05 00:11:42

解決方案2 1 2020-02-11 13:28:36

解決方案1
3 2022-02-05 00:11:42

解決方案2
1 2020-02-11 13:28:36