[英]Why is the L3 cache ignored by cachegrind, contradicting documentation?
I want to learn how people do cache optimization and I was suggested cachegrind by a friend as a useful tool towards this goal. 我想了解人们如何进行缓存优化,我被朋友建议cachegrind作为实现此目标的有用工具。
Valgrind being a CPU simulator, assumes a 2-level cache, as mentioned here , when using cachegrind Valgrind的是一个CPU模拟器,假定一个2级高速缓存,如所提到这里 ,使用cachegrind时
Cachegrind simulates how your program interacts with a machine's cache hierarchy and (optionally) branch predictor. Cachegrind模拟程序如何与机器的缓存层次结构和(可选)分支预测器进行交互。 It simulates a machine with independent first-level instruction and data caches (I1 and D1), backed by a unified second-level cache (L2). 它模拟具有独立的第一级指令和数据缓存(I1和D1)的机器,由统一的二级缓存(L2)支持。 This exactly matches the configuration of many modern machines. 这与许多现代机器的配置完全匹配。
The next paragraph continues as 下一段继续为
However, some modern machines have three or four levels of cache. 但是,一些现代机器具有三级或四级缓存。 For these machines (in the cases where Cachegrind can auto-detect the cache configuration) Cachegrind simulates the first-level and last-level caches. 对于这些机器(在Cachegrind可以自动检测缓存配置的情况下), Cachegrind模拟第一级和最后一级缓存。 The reason for this choice is that the last-level cache has the most influence on runtime, as it masks accesses to main memory. 这种选择的原因是最后一级缓存对运行时影响最大,因为它掩盖了对主内存的访问。
However when I tried running the valgrind on my simple matrix-matrix multiplication code, I got the following output. 然而,当我尝试在我的简单矩阵 - 矩阵乘法代码上运行valgrind时,我得到了以下输出。
==6556== Cachegrind, a cache and branch-prediction profiler
==6556== Copyright (C) 2002-2010, and GNU GPL'd, by Nicholas Nethercote et al.
==6556== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==6556== Command: ./a.out
==6556==
--6556-- warning: L3 cache detected but ignored
==6556==
==6556== I refs: 50,986,869
==6556== I1 misses: 1,146
==6556== L2i misses: 1,137
==6556== I1 miss rate: 0.00%
==6556== L2i miss rate: 0.00%
==6556==
==6556== D refs: 20,232,408 (18,893,241 rd + 1,339,167 wr)
==6556== D1 misses: 150,194 ( 144,869 rd + 5,325 wr)
==6556== L2d misses: 10,451 ( 5,506 rd + 4,945 wr)
==6556== D1 miss rate: 0.7% ( 0.7% + 0.3% )
==6556== L2d miss rate: 0.0% ( 0.0% + 0.3% )
==6556==
==6556== L2 refs: 151,340 ( 146,015 rd + 5,325 wr)
==6556== L2 misses: 11,588 ( 6,643 rd + 4,945 wr)
==6556== L2 miss rate: 0.0% ( 0.0% + 0.3% )
According to the documentation, the L1 and the L3 caches should have been used but the output says that L3 cache is being ignored. 根据文档,应该使用L1和L3缓存,但输出表明L3缓存被忽略。 Why is that? 这是为什么?
Also does cachegrind preassume what the L1 and last-level cache sizes are, or does it use the L1 and last-level cache sizes of the CPU it is currently running on? cachegrind也会预先确定L1和最后一级缓存大小是什么,或者它是否使用当前运行的CPU的L1和最后一级缓存大小?
You're running on an intel CPU that cachegrind appears to not have full support for. 你在intel CPU上运行,cachegrind似乎没有完全支持。 They inspect the cpuid flags and determine support based on a huge set of case statements for different processors. 他们检查cpuid标志并根据不同处理器的大量case语句确定支持。
This is from a unofficial copy of the code, but is illustrative - https://github.com/koriakin/valgrind/blob/master/cachegrind/cg-x86-amd64.c : 这是来自代码的非官方副本,但是是说明性的 - https://github.com/koriakin/valgrind/blob/master/cachegrind/cg-x86-amd64.c :
/* Intel method is truly wretched. We have to do an insane indexing into an
* array of pre-defined configurations for various parts of the memory
* hierarchy.
* According to Intel Processor Identification, App Note 485.
*/
static
Int Intel_cache_info(Int level, cache_t* I1c, cache_t* D1c, cache_t* L2c)
{
...
case 0x22: case 0x23: case 0x25: case 0x29:
case 0x46: case 0x47: case 0x4a: case 0x4b: case 0x4c: case 0x4d:
case 0xe2: case 0xe3: case 0xe4: case 0xea: case 0xeb: case 0xec:
VG_(dmsg)("warning: L3 cache detected but ignored\n");
break;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.