简体   繁体   English

在现代AMD64 CPU上最快的内存设置方式

[英]Fastest way to memset on modern amd64 CPUs

I'd like to fill an array of 4096 bytes (aligned to the 4096-byte boundary) with zeros in amd64 assembly. 我想在amd64汇编中用零填充一个4096字节(与4096字节边界对齐)的数组。 I'm looking for both portable and single-CPU-type-only solutions. 我正在寻找便携式和单CPU类型的解决方案。

I know that rep stosq would do the trick, but is there anything faster? 我知道rep stosq可以解决问题,但是有什么更快的方法吗? MMX? MMX? SSE? SSE? How much faster is it? 它快多少? How many bytes can be written to memory in a single instruction (without rep )? 一条指令(不带rep )可以将多少字节写入内存? We can assume that the memory cache is empty. 我们可以假设内存缓存为空。 I don't need a fully working function implementation, I just need the basic idea with its crucial assembly instruction. 我不需要功能完全正常的实现,我只需要基本概念及其关键的汇编指令。

I've just seen the movdqa instruction which can write 16 bytes at a time. 我刚刚看过movdqa指令,它可以一次写入16个字节。 Is it twice as fast as 2 mov instructions of 8 bytes each? 它是2条mov指令(每条8字节)的两倍快吗?

The answer to your question can be found by looking at the source code in the file memset64.asm in Agner Fog's asmlib . 通过在Agner Fog的asmlib中的memset64.asm文件中memset64.asm源代码,可以找到问题的答案

His code has a version for AVX and SSE. 他的代码具有适用于AVX和SSE的版本。 From what I can tell the code does _mm256_store_ps (vmovaps) for some size of the array less than MemsetCacheLimit . 从我可以看出的代码_mm256_store_ps (vmovaps)对小于MemsetCacheLimit某些数组大小MemsetCacheLimit For larger array sizes he does non-temporal stores with _mm256_stream_ps (vmovntps) . 对于更大的数组,他使用_mm256_stream_ps (vmovntps)非临时存储。 There are several other factors which can affect the results. 还有其他一些因素可能会影响结果。 See the code. 参见代码。 You could probably get the same performance for most cases with C/C++ using intrinsic functions. 在大多数情况下,使用内在函数的C / C ++可能会获得相同的性能。

Note that the both the built-in memset function in GCC as well as the version in glibc last I checked are not optimized (which is one reason memset is in the asmlib). 请注意,GCC中内置的memset函数以及我最后检查的glibc版本均未优化 (这是memset位于asmlib中的原因之一)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM