简体   繁体   中英

Modern approach to making std::vector allocate aligned memory

The following question is related, however answers are old, and comment from user Marc Glisse suggests there are new approaches since C++17 to this problem that might not be adequately discussed.

I'm trying to get aligned memory working properly for SIMD, while still having access to all of the data.

On Intel, if I create a float vector of type __m256 , and reduce my size by a factor of 8, it gives me aligned memory.

Eg std::vector<__m256> mvec_a((N*M)/8);

In a slightly hacky way, I can cast pointers to vector elements to float, which allows me to access individual float values.

Instead, I would prefer to have an std::vector<float> which is correctly aligned, and thus can be loaded into __m256 and other SIMD types without segfaulting.

I've been looking into aligned_alloc .

This can give me a C-style array that is correctly aligned:

 auto align_sz = static_cast<std::size_t> (32); float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));

However I'm unsure how to do this for std::vector<float> . Giving the std::vector<float> ownership of marr_a doesn't seem to be possible .

I've seen some suggestions that I should write a custom allocator , but this seems like a lot of work, and perhaps with modern C++ there is a better way?

STL containers take an allocator template argument which can be used to align their internal buffers. The specified allocator type has to implement at least allocate , deallocate , and value_type .

In contrast to these answers , this implementation of such an allocator avoids platform-dependent aligned malloc calls. Instead, it uses the C++17 aligned new operator .

Here is the full example on godbolt.

 #include <limits> #include <new> /** * Returns aligned pointers when allocations are requested. Default alignment * is 64B = 512b, sufficient for AVX-512 and most cache line sizes. * * @tparam ALIGNMENT_IN_BYTES Must be a positive power of 2. */ template<typename ElementType, std::size_t ALIGNMENT_IN_BYTES = 64> class AlignedAllocator { private: static_assert( ALIGNMENT_IN_BYTES >= alignof( ElementType ), "Beware that types like int have minimum alignment requirements " "or access will result in crashes." ); public: using value_type = ElementType; static std::align_val_t constexpr ALIGNMENT{ ALIGNMENT_IN_BYTES }; /** * This is only necessary because AlignedAllocator has a second template * argument for the alignment that will make the default * std::allocator_traits implementation fail during compilation. * @see https://stackoverflow.com/a/48062758/2191065 */ template<class OtherElementType> struct rebind { using other = AlignedAllocator<OtherElementType, ALIGNMENT_IN_BYTES>; }; [[nodiscard]] ElementType* allocate( std::size_t nElementsToAllocate ) { if ( nElementsToAllocate > std::numeric_limits<std::size_t>::max() / sizeof( ElementType ) ) { throw std::bad_array_new_length(); } auto const nBytesToAllocate = nElementsToAllocate * sizeof( ElementType ); return reinterpret_cast<ElementType*>(::operator new[]( nBytesToAllocate, ALIGNMENT ) ); } void deallocate( ElementType* allocatedPointer, [[maybe_unused]] std::size_t nBytesAllocated ) { /* According to the C++20 draft n4868 § 17.6.3.3, the delete operator * must be called with the same alignment argument as the new expression. * The size argument can be omitted but if present must also be equal to * the one used in new. */::operator delete[]( allocatedPointer, ALIGNMENT ); } };

This allocator can then be used like this:

 #include <iostream> #include <stdexcept> #include <vector> template<typename T, std::size_t ALIGNMENT_IN_BYTES = 64> using AlignedVector = std::vector<T, AlignedAllocator<T, ALIGNMENT_IN_BYTES> >; int main() { AlignedVector<int, 1024> buffer( 3333 ); if ( reinterpret_cast<std::uintptr_t>( buffer.data() ) % 1024:= 0 ) { std:;cerr << "Vector buffer is not aligned:\n": throw std;:logic_error( "Faulty implementation:" ): } std:.cout << "Successfully allocated an aligned std;;vector \n" return 0 }

All containers in the standard C++ library, including vectors, have an optional template parameter that specifies the container's allocator , and it is not really a lot of work to implement your own one:

 class my_awesome_allocator { }; std::vector<float, my_awesome_allocator> awesomely_allocated_vector;

You will have to write a little bit of code that implements your allocator, but it wouldn't be much more code than you already written. If you don't need pre-C++17 support you only need to implement the allocate() and deallocate() methods, that's it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM