简体   繁体   English

自定义分配器和 memory alignment

[英]Custom allocator and memory alignment

I'm trying to implement custom allocator to work with std containers based on the requirements here: https://en.cppreference.com/w/cpp/named_req/Allocator我正在尝试根据此处的要求实现自定义分配器以使用标准容器: https://en.cppreference.com/w/cpp/named_req/Allocator

I'm currently trying to implement a linear allocator and I'm having hard time with memory alignment.我目前正在尝试实现线性分配器,但我很难使用 memory alignment。
After I allocate a block of memory I'm wondering how much padding do I need between each object in the block to optimize cpu read/writes.在我分配一块 memory 之后,我想知道块中每个 object 之间需要多少填充以优化 CPU 读/写。 I'm not sure if the address alignment should be divisible我不确定地址 alignment 是否应该是整除的

  • by the cpu word size (4 bytes on 32 bits machine and 8 bytes on 64 bits machine)按 cpu 字长(32 位机器上 4 个字节,64 位机器上 8 个字节)
  • by the sizeof(T)sizeof(T)
  • by the alignof(T)通过alignof(T)

I read different answers different places.我在不同的地方阅读了不同的答案。
For example in this question the accepted answers says:例如在这个问题中,接受的答案说:

The usual rule of thumb (straight from Intels and AMD's optimization manuals) is that every data type should be aligned by its own size.通常的经验法则(直接来自 Intel 和 AMD 的优化手册)是每种数据类型都应该按照自己的大小对齐。 An int32 should be aligned on a 32-bit boundary, an int64 on a 64-bit boundary, and so on. int32 应该在 32 位边界上对齐,int64 在 64 位边界上对齐,依此类推。 A char will fit just fine anywhere. char 适合任何地方。

So by that answer it looks like the address alignment should be divisible by sizeof(T) .因此,通过该答案,地址 alignment 应该可以被sizeof(T)整除。

On this question the second answer state that:关于这个问题的第二个答案 state 认为:

The CPU always reads at its word size (4 bytes on a 32-bit processor), so when you do an unaligned address access — on a processor that supports it — the processor is going to read multiple words. CPU 总是以其字长读取(在 32 位处理器上为 4 个字节),因此当您在支持它的处理器上执行未对齐的地址访问时,处理器将读取多个字。

So by that answer it looks like the address alignment should be divisble by the cpu word size.因此,通过该答案,地址 alignment 应该可以被 CPU 字长整除。

So I'm seeing some conflicted statements on how to optimize data alignment for cpu read/write and I'm not sure if I'm not understanding something correctly or there're some wrong answers?所以我看到一些关于如何优化数据 alignment 以进行 CPU 读/写的相互矛盾的陈述,我不确定我是否理解不正确或者有一些错误的答案? Maybe someone could clear this out for me on what the address alignment should be divisible by.也许有人可以帮我弄清楚地址 alignment 应该被什么整除。

As a general rule-of-thumb (that is, do this unless you have a good reason to do otherwise), you want to align elements of a given C++ type to their alignment, ie, alignof(T) .作为一般的经验法则(也就是说,除非您有充分的理由不这样做),否则您希望将给定 C++ 类型的元素与其 alignment 对齐,即alignof(T) If the type wants to be aligned to a 32-bit boundary (as int in most common c++ implementation is implemented), it will exhibit a fitting (4-byte) alignment.如果该类型想要与 32 位边界对齐(如在最常见的 c++ 实现中实现的int ),它将显示一个合适的(4 字节)alignment。

Of course, between the base addresses of two different objects of type T there must be at least sizeof(T) bytes of room, which will usually be an integer-valued multiple of its alignment (it is actually quite hard to pass an over-aligned type to a template function, as it will strip any outer alignas attribute).当然,在T类型的两个不同对象的基地址之间必须至少有sizeof(T)字节的空间,这通常是其 alignment 的整数值倍数(实际上很难通过 over-将类型对齐到模板 function,因为它将去除任何外部alignas属性)。

In most use-cases, you will therefore be fine by doing the following: Find the first base-address in your underlying storage that is aligned to alignof(T) and then go forward from there in steps of sizeof(T) .因此,在大多数用例中,您可以通过执行以下操作:在底层存储中找到与alignof(T)对齐的第一个基地址,然后以sizeof(T)的步长从那里向前找到 go 。

This way, you will rely on the users of your allocator to tell you what they want.这样,您将依靠分配器的用户告诉您他们想要什么。 This is exactly what you want, as the optimizer may rely on knowledge about alignment and, eg, emit SSE aligned loads for arrays of double-precision floats, which will cause your program to crash if they are aligned wrongly.这正是您想要的,因为优化器可能依赖于有关 alignment 的知识,例如,为双精度浮点数的 arrays 发出 SSE 对齐负载,如果它们对齐错误,这将导致您的程序崩溃。

Going down the rabbit hole走下兔子洞

This gives rise to the following possible situations:这导致了以下可能的情况:

  1. Easy type, has word length and word alignment (eg, int with sizeof(int) = 4 and alignof(int) = 4 ):简单类型,具有字长和字 alignment (例如, int with sizeof(int) = 4alignof(int) = 4 ):
sizeof(T) = 4 and alignof(T) = 4
 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F 
[aaaaaaaaaa][bbbbbbbbbb][cccccccccc][dddddddddd]
  1. Types which have a size that is a multiple of its alignment (eg, using T = int[2] )大小是其 alignment 倍数的类型(例如, using T = int[2]
sizeof(T) = 8 and alignof(T) = 4
 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F 
[aaaaaaaaaaaaaaaaaaaaaa][bbbbbbbbbbbbbbbbbbbbbb]
  1. Overaligned types, which have a larger alignment than size (eg, using T = alignas(8) char[3] ).过度对齐的类型,其 alignment 比 size 大(例如, using T = alignas(8) char[3] )。 Here be dragons!这里是龙!
sizeof(T) = 3 and alignof(T) = 8
 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F 
[aaaaaaa]               [bbbbbbb]

Note that there is unused space in the overaligned example.请注意,在过度对齐的示例中有未使用的空间 This is necessary, as objects that are aligned to an 8-byte boundary may not be put anywhere else, leading to potential wastage.这是必要的,因为与 8 字节边界对齐的对象可能不会放在其他任何地方,从而导致潜在的浪费。 The most common use for types such as this is CPU-specific optmizations, such as to prevent false sharing .此类类型最常见的用途是特定于 CPU 的优化,例如防止虚假共享

  1. Finally, there is the slightly weird case of objects whose size is larger than, but not an integer multiple of their alignment (eg, using T = alignas(4) char[5]; ).最后,有些对象的大小大于但不是 alignment 的倍数,但不是 integer 的倍数(例如, using T = alignas(4) char[5]; )。 This is basically just a small extension to the previous example of overaligned types:这基本上只是对前面过度对齐类型示例的一个小扩展:
sizeof(T) = 5 and alignof(T) = 4
 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F 
[aaaaaaaaaaaaa]         [bbbbbbbbbbbbb]

While the alignment would make it possible to place the second object at base address 4 , there is already an object there.虽然 alignment 可以将第二个 object 放置在基地址4处,但那里已经存在 object。

Putting all these examples together, the number of bytes that needs to be between the base addresses of two objects of type T is:将所有这些示例放在一起,需要位于T类型的两个对象的基地址之间的字节数为:

inline auto object_distance = sizeof(T) % alignof(T) == 0 ? sizeof(T) : sizeof(T) + (alignof(T) - sizeof(T) % alignof(T));

After I allocate a block of memory I'm wondering how much padding do I need between each object in the block to optimize cpu read/writes.在我分配一块 memory 之后,我想知道块中每个 object 之间需要多少填充以优化 CPU 读/写。

Precisely zero padding between objects;对象之间精确的零填充; you're not allowed to add padding.不允许添加填充。 In the C++ standard library allocator model, your allocator<T>::allocate(count) method is required to allocate sufficient space to store an array of count objects of type T .在 C++ 标准库分配器 model 中,您的allocator<T>::allocate(count)方法需要分配足够的空间来存储T类型的count对象数组。 Arrays in C++ are tightly packed; C++中的Arrays包装紧密; the offset from one T in the array to another T is required to be sizeof(T) .从数组中的一个T到另一个T的偏移量必须是sizeof(T)

So you can't insert padding between objects in the allocated storage.因此,您不能在分配的存储中的对象之间插入填充。 You can insert padding at the beginning of the block of memory you allocate, so that you can be accurate with alignof(T) (which your allocator<T>::allocate is also required to respect).您可以在您分配的 memory 块的开头插入填充,这样您就可以准确地使用alignof(T) (您的allocator<T>::allocate也需要遵守)。 But the returned pointer has to be a pointer to the aligned storage for the T s.但是返回的指针必须是指向T对齐存储的指针。 So if you have padding in the front of the allocation, you'll need some way to undo the padding when deallocate is called, since it only gets the aligned storage address.因此,如果您在分配的前面有填充,则需要在调用deallocate时撤消填充,因为它只获取对齐的存储地址。

When it comes to alignment of structs that contain fundamental types, you're relying on the compiler to impose its alignment requirements on those structs.当涉及到包含基本类型的结构的 alignment 时,您依赖编译器对这些结构施加其 alignment 要求。 So for this definition:所以对于这个定义:

struct U
{
  std::int32_t i;
  std::int64_t j;
};

If the compiler deems that it would be more optimal for int64_t 's to be on 8-byte alignments, then the compiler will insert appropriate padding between i and j in U .如果编译器认为int64_t在 8 字节对齐上会更优化,那么编译器将在U中的ij之间插入适当的填充。 sizeof(U) will be 16, and alignof(U) will be 8. sizeof(U)将为 16,而alignof(U)将为 8。

Creating that alignment is not your job, and you're not allowed to do it for the compiler.创建 alignment 不是您的工作,并且不允许您编译器执行此操作。 You must simply respect the alignment of any type you're given in your allocator<T>::allocate calls.您必须简单地尊重您在allocator<T>::allocate调用中给出的任何类型的 alignment。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM