简体   繁体   English

如何预取不常用的代码?

[英]How can I prefetch infrequently used code?

I want to prefetch some code into the instruction cache.我想将一些代码预取到指令缓存中。 The code path is used infrequently but I need it to be in the instruction cache or at least in L2 for the rare cases that it is used.代码路径很少使用,但我需要它在指令缓存中,或者至少在 L2 中,因为它很少被使用。 I have some advance notice of these rare cases.我提前通知了这些罕见的情况。 Does _mm_prefetch work for code? _mm_prefetch 对代码有效吗? Is there a way to get this infrequently used code in cache?有没有办法在缓存中获取这个不常用的代码? For this problem I don't care about portability so even asm would do.对于这个问题,我不关心可移植性,所以即使是 asm 也会这样做。

The answer depends on your CPU architecture.答案取决于您的 CPU 架构。

That said, if you are using gcc or clang, you can use the__builtin_prefetch instruction to try to generate a prefetch instruction.也就是说,如果您使用 gcc 或 clang,则可以使用__builtin_prefetch指令尝试生成预取指令。 On Pentium 3 and later x86-type architectures, this will generate a PREFETCHh instruction, which requests a load into the data cache hierarchy.在 Pentium 3 和更高版本的 x86 类型体系结构上,这将生成一条PREFETCHh指令,该指令请求加载到数据缓存层次结构中。 Since these architectures have unified L2 and higher caches, it may help.由于这些架构具有统一的 L2 和更高的缓存,因此可能会有所帮助。

The function looks like this:该函数如下所示:

__builtin_prefetch(const void *address, int locality);

The locality argument should be in the range 0...3. locality参数应该在 0...3 的范围内。 Assuming locality maps directly to the h part of the PREFETCHh instruction, you want to pass 1 or 2, which ask for the data to be loaded into the L2 and higher caches.假设locality直接映射到PREFETCHh指令的h部分,您希望传递 1 或 2,这要求将数据加载到 L2 和更高的缓存中。 See Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2B: Instruction Set Reference, MZ (PDF) page 4-277.请参阅英特尔® 64 位和 IA-32 架构软件开发人员手册第 2B 卷:指令集参考,MZ (PDF)第 4-277 页。 ( Find other volumes here .) 在此处查找其他卷。)

If you're using another compiler that doesn't have __builtin_prefetch , see whether it has the _mm_prefetch function.如果您使用的是另一个没有__builtin_prefetch编译器,请查看它是否具有_mm_prefetch函数。 You may need to include a header file to get that function.您可能需要包含一个头文件来获取该函数。 For example, on OS X, that function, and constants for the locality argument, are declared in xmmintrin.h .例如,在 OS X 上,该函数和locality参数的常量在xmmintrin.h中声明。

There isn't any (official [1] x86) instruction to prefetch code, only data.没有任何(官方 [1] x86)指令来预取代码,只有数据。 I find this a rather bizarre use-case, where the code-path is known beforehand, but executes rarely, and there is a significant benefit in prefetching the code.我发现这是一个相当奇怪的用例,其中代码路径是预先知道的,但很少执行,并且预取代码有很大的好处。 It would be great to understand where you've come to the conclusion that there is a significant benefit in pre-loading the code for this special case, since it would require not only analyzing that the code is significantly slower when it's not been hit for a long time, but also determining that there is spare bus-cycles to actually load the code before the processor can prefetch it by it's normal mechanism for loading code.很高兴了解您在哪里得出的结论是为这种特殊情况预加载代码有很大的好处,因为它不仅需要分析代码在没有被命中时明显变慢很长一段时间,但也确定有空闲总线周期来实际加载代码,然后处理器才能通过它的正常加载代码机制预取它。

You may be able to use the prefetch instructions that fetch into L2, which is typically shared between I- and D-cache.您也许可以使用prefetch指令来读取 L2,这通常在 I 缓存和 D 缓存之间共享。

[1] I know there are some "secret" instructions that allow the processor to manipulate cache-content, but since those would require a lot of extra work, even if you could use them in user-mode code [and I expect this is not some kernel-mode code]. [1] 我知道有一些“秘密”指令允许处理器操作缓存内容,但是由于这些指令需要大量额外的工作,即使您可以在用户模式代码中使用它们[我希望这是不是一些内核模式代码]。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM