简体   繁体   English

预取数据以缓存x86-64

[英]Prefetching data to cache for x86-64

In my application, at one point I need to perform calculations on a large contiguous block of memory data (100s of MBs). 在我的应用程序中,我需要在一个大的连续内存数据块(100个MB)上执行计算。 What I was thinking was to keep prefetching the part of the block my program will touch in future, so that when I perform calculations on that portion, the data is already in the cache. 我想的是继续预取我的程序将来会触摸的块的部分,这样当我对该部分执行计算时,数据已经在缓存中。

Can someone give me a simple example of how to achieve this with gcc? 有人能给我一个简单的例子来说明如何用gcc实现这个目标吗? I read _mm_prefetch somewhere, but don't know how to properly use it. 我在某处读了_mm_prefetch ,但不知道如何正确使用它。 Also note that I have a multicore system, but each core will be working on a different region of memory in parallel. 另请注意,我有一个多核系统,但每个核心将并行处理不同的内存区域。

gcc uses builtin functions as an interface for lowlevel instructions. gcc使用内置函数作为低级指令的接口。 In particular for your case __builtin_prefetch . 特别是对于你的情况__builtin_prefetch But you only should see a measurable difference when using this in cases where the access pattern is not easy to predict automatically. 但是,在访问模式不易自动预测的情况下,使用它时,您应该看到一个可衡量的差异。

Modern CPUs have pretty good automatic prefetch and you may well find that you do more harm than good if you try to initiate software prefetching. 现代CPU具有相当好的自动预取功能,如果您尝试启动软件预取,您可能会发现弊大于利。 There is most likely a lot more "low hanging fruit" that you can focus on for optimisation if you find that you actually have a performance problem. 如果您发现实际上存在性能问题,那么很可能会有更多“低挂果”,您可以专注于优化。 Prefetch tends to be one of the last things that you might try, when you're desperate for a few more percent throughput. 当您急需几个百分点的吞吐量时,预取往往是您可能尝试的最后一件事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM