简体   繁体   English

如何最轻松地预取存储区域?

[英]How can I prefetch a memory region most easily?

Background: I've implemented a stochastic algorithm that requires random ordering for best convergence. 背景:我实现了一种随机算法,该算法需要随机排序以实现最佳收敛。 Doing so obviously destroys memory locality, however. 但是,这样做显然会破坏内存局部性。 I've found that by prefetching the next iteration's data, the performance drop is minimized. 我发现通过预取下一个迭代的数据,可以将性能下降降至最低。

I can prefetch n cache lines using _mm_prefetch in a simple, mostly OS+compiler-portable fashion - but what's the length of a cache line? 我可以使用_mm_prefetch以一种简单的方式(主要是OS +编译器可移植的方式)预取n条缓存行-但是缓存行的长度是多少? Right now, I'm using a hardcoded value of 64, which seems to be the norm nowadays on x64 processors - but I don't know how to detect this at runtime, and a question about this last year found no simple solution . 现在,我正在使用64的硬编码值,这在当今的x64处理器上似乎很正常-但是我不知道如何在运行时检测到这一点,去年的一个问题没有找到简单的解决方案

I've seen GetLogicalProcessorInformation on windows but I'm leery of using such a complex API for something so simple, and that won't work on macs or linux anyhow. 我已经在Windows上看到了GetLogicalProcessorInformation ,但是我对使用如此复杂的API 来做这么简单的事情并不满意,无论如何在Mac或Linux上都无法使用。

Perhaps there's some entirely other API/intrinsic that could prefetch a memory region identified in terms of bytes (or words, or whatever) and allows me to prefetch without knowing the cache line length? 也许还有其他一些API / intrinsic可以预取以字节(或字或其他形式)标识的内存区域,并允许我在不知道缓存行长的情况下进行预取?

Basically, is there a reasonable alternative to _mm_prefetch with #define CACHE_LINE_LEN 64 ? 基本上,是有一个合理的替代_mm_prefetch#define CACHE_LINE_LEN 64

There's a question asking just about the same thing here . 这里有一个问题问的差不多 You can read it from the CPUID if you feel like delving into some assembly. 如果您想研究某些程序集,则可以从CPUID中读取它。 You'll have to write platform specific code for this of course. 当然,您必须为此编写平台特定的代码。

You're probably already familiar with Agner Fog's manuals for optimization which gives the cache information for many popular processors. 您可能已经熟悉Agner Fog的优化手册,手册提供了许多流行处理器的缓存信息。 If you are able to determine the expected CPU's you'll encounter you can just hard-code the cache line sizes and look up the CPU vendor information to set the line size. 如果您能够确定预期的CPU数量,则可以对缓存行大小进行硬编码,然后查找CPU供应商信息以设置行大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM