简体   繁体   English

为什么要使用_mm_malloc? (与_aligned_malloc,alligned_alloc或posix_memalign相对)

[英]Why use _mm_malloc? (as opposed to _aligned_malloc, alligned_alloc, or posix_memalign)

There are a few options for acquiring an aligned block of memory but they're very similar and the issue mostly boils down to what language standard and platforms you're targeting. 获取一个对齐的内存块有几个选项,但它们非常相似,问题主要归结为您所针对的语言标准和平台。

C11 C11

void * aligned_alloc (size_t alignment, size_t size)

POSIX POSIX

int posix_memalign (void **memptr, size_t alignment, size_t size)

Windows 视窗

void * _aligned_malloc(size_t size, size_t alignment);

And of course it's also always an option to align by hand. 当然,手动对齐也是一种选择。

Intel offers another option. 英特尔提供另一种选择

Intel 英特尔

void* _mm_malloc (int size, int align)
void _mm_free (void *p)

Based on source code released by Intel, this seems to be the method of allocating aligned memory their engineers prefer but I can't find any documentation comparing it to other methods. 基于英特尔发布的源代码,这似乎是分配工程师喜欢的对齐内存的方法,但我找不到任何将其与其他方法进行比较的文档。 The closest I found simply acknowledges that other aligned memory allocation routines exist. 我发现的最接近的只是承认存在其他对齐的内存分配例程。

https://software.intel.com/en-us/articles/memory-management-for-optimal-performance-on-intel-xeon-phi-coprocessor-alignment-and https://software.intel.com/en-us/articles/memory-management-for-optimal-performance-on-intel-xeon-phi-coprocessor-alignment-and

To dynamically allocate a piece of aligned memory, use posix_memalign, which is supported by GCC as well as the Intel Compiler. 要动态分配一块对齐的内存,请使用posix_memalign,它由GCC和Intel Compiler支持。 The benefit of using it is that you don't have to change the memory disposal API. 使用它的好处是您不必更改内存处理API。 You can use free() as you always do. 您可以像往常一样使用free()。 But pay attention to the parameter profile: 但要注意参数配置文件:

int posix_memalign (void **memptr, size_t align, size_t size); int posix_memalign(void ** memptr,size_t align,size_t size);

The Intel Compiler also provides another set of memory allocation APIs. 英特尔编译器还提供另一组内存分配API。 C/C++ programmers can use _mm_malloc and _mm_free to allocate and free aligned blocks of memory. C / C ++程序员可以使用_mm_malloc和_mm_free来分配和释放对齐的内存块。 For example, the following statement requests a 64-byte aligned memory block for 8 floating point elements. 例如,以下语句为8个浮点元素请求64字节对齐的内存块。

farray = (float *)__mm_malloc(8*sizeof(float), 64); farray =(float *)__ mm_malloc(8 * sizeof(float),64);

Memory that is allocated using _mm_malloc must be freed using _mm_free. 必须使用_mm_free释放使用_mm_malloc分配的内存。 Calling free on memory allocated with _mm_malloc or calling _mm_free on memory allocated with malloc will result in unpredictable behavior. 在使用_mm_malloc分配的内存上调用free或在使用malloc分配的内存上调用_mm_free将导致不可预测的行为。

The clear differences from a user perspective is that _mm_malloc requires direct CPU and compiler support and memory allocated with _mm_malloc must be freed with _mm_free . 从用户的角度的明显不同的是, _mm_malloc需要CPU直接与分配的编译器的支持和内存_mm_malloc必须被释放_mm_free Given these drawbacks, what is the reason for ever using _mm_malloc? 鉴于这些缺点,使用_mm_malloc?的原因是什么_mm_malloc? Can it have a slight performance advantage? 它有轻微的性能优势吗? Historical accident? 历史事故?

Intel compilers support POSIX (Linux) and non-POSIX (Windows) operating systems, hence cannot rely upon either the POSIX or the Windows function. 英特尔编译器支持POSIX(Linux)和非POSIX(Windows)操作系统,因此不能依赖POSIX或Windows功能。 Thus, a compiler-specific but OS-agnostic solution was chosen. 因此,选择了特定于编译器但与OS无关的解决方案。

C11 is a great solution but Microsoft doesn't even support C99 yet, so who knows if they will ever support C11. C11是一个很好的解决方案,但微软甚至还不支持C99,所以谁知道他们是否会支持C11。

Update: Unlike the C11/POSIX/Windows allocation functions, the ICC intrinsics include a deallocation function. 更新:与C11 / POSIX / Windows分配功能不同,ICC内在函数包括释放功能。 This allows this API to use a separate heap manager from the default one. 这允许此API使用默认的单独的堆管理器。 I don't know if/when it actually does that, but it can be useful to support this model. 我不知道它是否/何时确实这样做,但支持这个模型会很有用。

Disclaimer: I work for Intel but have no special knowledge of these decisions, which happened long before I joined the company. 免责声明:我为英特尔工作但对这些决定没有特别的了解,这些决定早在我加入公司之前就已经发生了。

It's possible to take an existing C compiler which does not presently happen to use the identifiers _mm_alloc and _mm_free and define functions with those names which will behave as required. 可以使用现在不会使用标识符_mm_alloc_mm_free的现有C编译器,并使用将根据需要运行的那些名称定义函数。 This could be done either by having _mm_alloc function as a wrapper on malloc() which asks for a slightly-oversized allocation and constructs a pointer to the first suitably-aligned address within it that's at least one byte from the beginning, and storing the number of bytes skipped immediately before that address, or by having _mm_malloc request large chunks of memory from malloc() and then dispense them piecemeal. 这可以通过将_mm_alloc函数作为malloc()的包装器来完成,该函数要求稍微超大的分配,并构造一个指向其中第一个适当对齐的地址的指针,该地址距离开头至少有一个字节,并存储数字在该地址之前立即跳过的字节数,或者让_mm_mallocmalloc()请求大块内存然后_mm_malloc分配它们。 In any case, the pointers returned by _mm_malloc() would not be pointers that free() would generally know how to do anything with; 在任何情况下, _mm_malloc()返回的指针都不是free()通常知道如何做任何事情的指针; calling _mm_free would use the byte immediately preceding the allocation as an aid to finding the real start of the allocation received from malloc , and then pass that do free . 调用_mm_free将使用紧接在分配之前的字节作为辅助来查找从malloc接收的分配的真正开始,然后传递它是free

If an aligned-allocate function is allowed to use the internals of the malloc and free functions, however, that may eliminate the need for the extra layer of wrapping. 但是,如果允许对齐分配函数使用mallocfree函数的内部,则可以省去额外的包装层。 It's possible to write _mm_alloc() / _mm_free() functions which wraps malloc / free without knowing anything about their internals, but it requires that _mm_alloc() keep book-keeping information which is separate from that used by malloc / free . 编写包含malloc / free _mm_alloc() / _mm_free()函数是可能的,而不知道它们的内部结构,但它要求_mm_alloc()保留与malloc / free使用的信息分开的簿记信息。

If the author of an aligned-allocate function knows how malloc and free are implemented, it will often be possible to coordinate the design of all the allocation/free functions so that free can distinguish all kinds of allocations and handle them appropriately. 如果对齐分配函数的作者知道如何实现mallocfree ,则通常可以协调所有分配/自由函数的设计,以便free可以区分所有类型的分配并适当地处理它们。 No single aligned-allocate implementation would be usable on all malloc / free implementations, however. 但是,没有单个对齐分配实现可用于所有malloc / free实现。

I would suggest that the most portable way to write code would probably be to select a couple of symbols that are not used anywhere else for your own allocate and free functions, so that you could then say, eg 我建议最便携的编写代码的方法可能是选择一些在其他地方没有用到你自己的分配和自由函数的符号,这样你就可以说,例如

#define a_alloc(align,sz) _mm_alloc((align),(sz))
#define a_free(ptr)  _mm_free((ptr))

on compilers that support that, or 在支持它的编译器上,或

static inline void *aa_alloc(int align, int size)
{
  void *ret=0;
  posix_memalign(&ret, align, size); // Guessing here
  return ret;
}
#define a_alloc(align,sz) aa_alloc((align),(sz))
#define a_free(ptr)  free((ptr))

on Posix systems, etc. For every system it should be possible to define macros or functions that will yield the necessary behavior [I think it's probably better to use macros consistently than to sometimes use macros and sometimes functions, so as to allow #if defined macroname to test whether things are defined yet]. 在Posix系统等上。对于每个系统,应该可以定义将产生必要行为的宏或函数[我认为使用宏可能比有时使用宏有时更好,有时候使用函数,以便允许#if defined macroname来测试事物是否已定义]。

_mm_malloc seems to have been created before there was a standard aligned_alloc function, and the need to use _mm_free is a quirk of the implementation. 似乎在有标准的aligned_alloc函数之前已经创建了_mm_malloc,并且需要使用_mm_free是实现的一个怪癖。

My guess is that unlike when using posix_memalign, it doesn't need to over-allocate in order to guarantee alignment, instead it uses a separate alignment-aware allocator. 我的猜测是,与使用posix_memalign时不同,它不需要过度分配以保证对齐,而是使用单独的对齐感知分配器。 This will save memory when allocating types with alignment different to the default alignment (typically 8 or 16 bytes). 这将在分配具有与默认对齐方式不同的对齐类型(通常为8或16个字节)时节省内存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM