简体   繁体   English

SSE向量被“ 16字节指定”是什么意思,我如何确保是?

[英]What does it mean for an SSE vector to be “16 byte alligned” and how can I ensure that it is?

I'm working with vectors and matrices right now and it was suggested to me that I should use SSE instead of using float arrays. 我现在正在处理向量和矩阵,有人建议我应该使用SSE而不是浮点数组。 However while reading the definition for the C intrinsics and the Assembly instructions it looks like there is a different version of some of the function where the vector has to be "16 byte aligned" and a slower version where the vector isn't aligned. 但是,在阅读C内在函数和Assembly指令的定义时,似乎有些函数的版本不同,向量必须“ 16字节对齐”,而慢速版本则向量未对齐。 What does having the vector be 16 byte aligned mean? 向量为16字节对齐是什么意思? How can I ensure that my vectors are 16 byte aligned? 如何确保向量是16字节对齐的?

Alignment ensures that objects are aligned on an address that is a multiple of some power of two. 对齐确保对象在地址上对齐,该地址是2的幂的倍数。 16-byte-aligned means that the numeric value of the address is a multiple of 16. Alignment is important because CPUs are often less efficient or downright incapable of loading memory that doesn't have the required alignment. 16字节对齐意味着地址的数值是16的倍数。对齐很重要,因为CPU通常效率较低或完全无法加载没有所需对齐的内存。

Your ABI determines the natural alignment of types. 您的ABI确定类型的自然对齐方式。 In general, integer types and floating-point types are aligned to either their own size, or the size of the largest object of that kind that your CPU can treat at once, whichever is smaller . 通常,整数类型和浮点类型会根据它们自己的大小或CPU可以一次处理的那种最大对象的大小对齐,以较小者为准。 For instance, on 64-bit Intel machines, 32-bit integers are aligned on 4 bytes, 64-bit integers are aligned on 8 bytes, and 128-bit integers are also aligned on 8 bytes. 例如,在64位Intel计算机上,32位整数对齐4个字节,64位整数对齐8个字节,128位整数对齐8个字节。

The alignment of structures and unions is the same as their most aligned field. 结构和联合的对齐方式与其最对齐的字段相同。 This means that if your struct contains a field that has a 2-byte alignment and another field that has an 8-byte alignment, the structure will be aligned to 8 bytes. 这意味着,如果您的struct包含一个2字节对齐的字段和另一个8字节对齐的字段,则该结构将对齐8个字节。

In C++, you can use the alignof operator, just like the sizeof operator, to get the alignment of a type. 在C ++中,可以使用alignof运算符(与sizeof运算符一样)来获取类型的对齐方式。 In C, the same construct becomes available when you include <stdalign.h> ; 在C中,当包含<stdalign.h>时,相同的构造可用。 alternatively, you can use _Alignof without including anything. 或者,您可以不使用任何内容而使用_Alignof

AFAIK, there is no standard way to force alignment to be specific value in C or C++, but there are compiler-specific extensions to do it. AFAIK,没有标准的方法可以将对齐方式强制为C或C ++中的特定值,但是有特定于编译器的扩展可以做到这一点。 On Clang and GCC, you can use the __attribute__((aligned(N))) attribute: 在Clang和GCC上,可以使用__attribute__((aligned(N)))属性:

struct s_Stuff {
   int var1;
   short  var2;
   char padding[10];
} __attribute__((aligned(16)));

( Example. ) 示例。

(This attribute is not to be confused with __attribute__((align(N))) , which sets the alignment of a variable .) 请勿将此属性与__attribute__((align(N)))混淆,后者会设置变量的对齐方式。)

Off the top of my head, I'm not sure for Visual Studio, but according to SoronelHaetir , that would be __declspec(align(N)) . 我不敢肯定,我不确定要使用Visual Studio,但是根据SoronelHaetir所说 ,那应该是__declspec(align(N)) Not sure where it goes on the struct declaration. 不知道它在struct声明中的位置。

In the context of vector instructions, alignment is important because people tend to create arrays of floating-point values and operate on them, instead of using types that are known to be aligned. 在矢量指令的上下文中,对齐很重要,因为人们倾向于创建浮点值的数组并对其进行操作,而不是使用已知的对齐类型。 However, __m128 , __m256 and __m512 (and all of their variants, like _m128i and such) from <emmintrin.h> , if your compiler environment has it, are guaranteed to be aligned on the proper boundaries for use with aligned intrinsics. 但是,如果编译器环境具有,则<emmintrin.h> __m128__m256__m512 (及其所有变体,例如_m128i等)都可以确保在正确的边界上对齐,以与对齐的内在函数一起使用。

Depending on your platform, malloc may or may not return memory that is aligned on the correct boundary for vector objects. 根据您的平台, malloc可能会或可能不会返回在矢量对象的正确边界上对齐的内存。 aligned_alloc was introduced in C11 to address these issues, but not all platforms support it. aligned_alloc中引入了aligned_alloc来解决这些问题,但并非所有平台都支持它。

  • Apple: does not support aligned_alloc ; 苹果:不支持aligned_alloc malloc returns objects on the most exigent alignment that the platform supports; malloc以平台支持的最紧急的对齐方式返回对象;
  • Windows: does not support aligned_alloc ; Windows:不支持aligned_alloc malloc returns objects aligned on the largest alignment that VC++ will naturally put an object on without an alignment specification; malloc返回以最大对齐方式对齐的对象,VC ++自然会在没有对齐规范的情况下将其置于最大对齐方式; use _aligned_malloc for vector types _aligned_malloc用于矢量类型
  • Linux: malloc returns objects aligned on an 8- or 16-byte boundary ; Linux: malloc返回在8字节或16字节边界上对齐的对象; use aligned_alloc . 使用aligned_alloc

In general, it's possible to request slightly more memory and perform alignment yourself with minimal penalties (aside that you're on your own to write a free -like function that will accept a pointer returned by this function): 通常,可能会请求更多的内存并以最小的代价自己执行对齐(除了您自己编写一个类似free的函数,该函数将接受此函数返回的指针):

void* aligned_malloc(size_t size, size_t alignment) {
    intptr_t alignment_mask = alignment - 1;
    void* memory = malloc(size + alignment_mask);
    intptr_t unaligned_ptr = (intptr_t)memory;
    intptr_t aligned_ptr = (unaligned_ptr + alignment_mask) & ~alignment_mask;
    return (void*)aligned_ptr;
}

Purists might argue that treating pointers as integers is evil, but at the time of writing, they probably won't have a practical cross-platform solution to offer in exchange. 纯粹主义者可能会认为将指针视为整数是邪恶的,但是在撰写本文时,他们可能没有实用的跨平台解决方案来提供交换条件。

xx-byte alignment means that a the variable's memory address modulo xx is 0. xx字节对齐表示变量以xx为模的内存地址为0。

Ensuring that is a compiler-specific operation, visual c++ for example has __declspec(align(...)), which will work for variables that the compiler allocates (at file or function scope for example), alignment is somewhat harder for dynamic memory, you can use aligned_malloc for that, although your library may already guarantee 16-byte alignment for malloc, it's generally larger alignments that require such a call. 确保这是特定于编译器的操作,例如,Visual c ++具有__declspec(align(...)),它将对编译器分配的变量(例如,在文件或函数范围内)起作用,对于动态内存而言,对齐会有些困难,您可以为此使用aligned_malloc,尽管您的库可能已经保证了malloc的16字节对齐,但是通常较大的对齐方式需要这样的调用。

New Edit to improve and focus my answer to the specific query 新的“修改”功能可以改善我的答案并使之集中于特定查询

To ensure data alignment in memory, there are specific functions in C to force this (assuming your data is compatible - where your data matches or discretely fits into your required alignment) 为了确保内存中的数据对齐,C语言中有特定的功能强制执行此操作(假设您的数据兼容-您的数据匹配或离散地适合您所需的对齐方式)

The function to use is [_aligned_malloc][1] instead of vanilla malloc . 使用的函数是[_aligned_malloc][1]而不是vanilla malloc

// Using _aligned_malloc  
// Note alignment should be 2^N where N is any positive int.  
int alignment = 16;
ptr = _aligned_malloc('required_size', alignment);  
if (ptr == NULL)  
{  
    printf_s( "Error allocation aligned memory.");  
    return -1;  
}  

This will (if it succeeds) force your data to align on the 16 byte boundary and should satisfy the requirements for SSE . (如果成功)将强制您的数据在16字节边界上对齐,并且应满足SSE的要求。

Older answer where I waffle on about struct member alignment, which matters - but is not directly answering the query 关于结构成员对齐的较旧答案,这很重要-但不是直接回答查询

To ensure struct member byte alignment, you can be careful how you arrange members in your structs (largest first), or you can set this (to some degree) in your compiler settings, member attributes or struct attributes. 为确保结构成员字节对齐,请小心如何在结构中排列成员(从大到大),或者可以在编译器设置,成员属性或结构属性中(在某种程度上)进行设置。

Assuming 32 bit machine, 4 byte ints: This is still 4 byte aligned in memory (first largest member is 4 bytes), but padded to be 16 bytes in size. 假设是32位计算机,则为4字节整数:这在内存中仍为4字节对齐(第一个最大的成员为4字节),但填充大小为16字节。

struct s_Stuff {
   int var1;  /* 4 bytes */
   short  var2;  /* 2 bytes */
   char padding[10];  /* ensure totals struct size is 16 */
}

The compiler usually pads each member to assist with natural alignment, but the padding may be at the end of the struct too. 编译器通常会填充每个成员以帮助自然对齐,但是填充也可能位于结构的末尾。 This is struct member data alignment . 这是结构成员数据对齐

Older compiler struct member alignment settings could look similar to these 2 images below...But this is different to data alignment which relates to memory allocation and storage of the data. 较早的编译器结构成员对齐设置看起来可能类似于下面的这两个图像...但是,这与涉及内存分配和数据存储的数据对齐不同。

从MS Visual Studio 6 C / C ++

从Borland 5 C / C ++编译器

It confuses me when Borland uses the phrase (from the images) Data Alignment, and MS uses Struct member alignment. 当Borland使用短语(来自图像)数据对齐,而MS使用Struct成员对齐时,这使我感到困惑。 (Although they both refer to specifically struct member alignment) (尽管它们都专门针对结构成员对齐)

To maximise efficiency, you need to code for your hardware (or vector processing in this case), so lets assume 32 bit, 4 byte ints, etc. Then you want to use tight structs to save space, but padded structs may improve speed. 为了最大化效率,您需要为硬件编码(在这种情况下为矢量处理),因此假设32位,4字节int等。然后,您想使用紧密的结构来节省空间,但是填充的结构可以提高速度。

struct s_Stuff {
   float f1;   /* 4 bytes */
   float f2;   /* 4 bytes */
   float f3;   /* 4 bytes */
   short  var2;  /* 2 bytes */
}

This struct may be padded to also align the struct members to 4 byte multiples....The compiler will do this unless you specify that it keeps single byte struct member alignment - so the size ON FILE could be 14 bytes, but still in MEMORY an array of this struct would be 16 bytes in size (with 2 bytes wasted), with an unknown data alignment (possibly 8 bytes as default by malloc but not guaranteed. As mentioned above you can force the data alignment in memory with _aligned_malloc on some platforms) 这种结构可以被填充到也是结构成员对齐到4倍字节的倍数....编译器会做到这一点,除非你指定它保持单字节结构成员对齐 -在内存中,以便大小而定的文件可能是14个字节,但仍此结构的数组大小为16个字节(浪费了2个字节),数据对齐方式未知(可能是malloc默认为8个字节,但不能保证。如上所述,您可以在某些内存上使用_aligned_malloc强制数据对齐)平台)

Also regarding member alignment in a struct, the compiler will use multiples of the largest member to set the alignment. 同样关于结构中的member alignment ,编译器将使用最大成员的倍数来设置对齐。 Or more specifically: 或更具体地说:

A struct is always aligned to the largest type's alignment requirements 结构始终与最大类型的对齐要求对齐

...from here ...从这里

If you are using a UNION, you are correct that it is forced to the largest possible struct see here 如果您正在使用UNION,你是正确的,它被强制为尽可能大的结构看这里

Check that your compiler settings do not contradict your desired struct member alignment / padding too, or else your structs may differ in size to what you expect. 检查您的编译器设置是否也没有与所需的结构成员对齐/填充冲突,否则您的结构大小可能与您期望的不同。

Now, why is it faster? 现在,为什么速度更快? See here which explains how alignment allows the hardware to transmit discrete chunks of data and maximises the use of the hardware that passes around data. 请参阅此处 ,其中说明了对齐如何使硬件能够传输离散的数据块,并最大程度地利用传递数据的硬件。 That is, the data does not need to be split up or re-arranged at every stage - through the hardware processing 也就是说,不需要在每个阶段都对数据进行拆分或重新安排-通过硬件处理

As a rule, its best to set your compiler to resonate with your hardware (and platform OS) so that your alignment (and padding) works best with your hardware processing ability. 通常,最好将编译器设置为与硬件(和平台OS)产生共鸣,以使对齐方式(和填充)与硬件处理能力最匹配。 32 bit machines usually work best with 4 byte (32 bit) member alignment, but then data written to file with 4 byte member alignment can consume more space than wanted. 32位计算机通常在4字节(32位)成员对齐的情况下效果最好,但是,以4字节成员对齐的方式写入文件的数据可能会消耗比所需更多的空间。

Specifically regarding SSE vectors , as this link states, 4 * 4 bytes is they best way to ensure 16 byte alignment, perhaps like this. 特别是关于SSE向量 ,正如此链接所述,4 * 4字节是确保16字节对齐的最佳方法,也许是这样。 (And they refer to data alignment here) (它们在这里指的是数据对齐)

struct s_data {
   float array[4];
}

or simply an array of floats , or doubles . 或简单的由floats的数组或doubles

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM