简体   繁体   English

为什么cachegrind忽略了L3缓存,这与文档相矛盾?

[英]Why is the L3 cache ignored by cachegrind, contradicting documentation?

I want to learn how people do cache optimization and I was suggested cachegrind by a friend as a useful tool towards this goal. 我想了解人们如何进行缓存优化,我被朋友建议cachegrind作为实现此目标的有用工具。

Valgrind being a CPU simulator, assumes a 2-level cache, as mentioned here , when using cachegrind Valgrind的是一个CPU模拟器,假定一个2级高速缓存,如所提到这里 ,使用cachegrind时

Cachegrind simulates how your program interacts with a machine's cache hierarchy and (optionally) branch predictor. Cachegrind模拟程序如何与机器的缓存层次结构和(可选)分支预测器进行交互。 It simulates a machine with independent first-level instruction and data caches (I1 and D1), backed by a unified second-level cache (L2). 它模拟具有独立的第一级指令和数据缓存(I1和D1)的机器,由统一的二级缓存(L2)支持。 This exactly matches the configuration of many modern machines. 这与许多现代机器的配置完全匹配。

The next paragraph continues as 下一段继续为

However, some modern machines have three or four levels of cache. 但是,一些现代机器具有三级或四级缓存。 For these machines (in the cases where Cachegrind can auto-detect the cache configuration) Cachegrind simulates the first-level and last-level caches. 对于这些机器(在Cachegrind可以自动检测缓存配置的情况下), Cachegrind模拟第一级最后一级缓存。 The reason for this choice is that the last-level cache has the most influence on runtime, as it masks accesses to main memory. 这种选择的原因是最后一级缓存对运行时影响最大,因为它掩盖了对主内存的访问。

However when I tried running the valgrind on my simple matrix-matrix multiplication code, I got the following output. 然而,当我尝试在我的简单矩阵 - 矩阵乘法代码上运行valgrind时,我得到了以下输出。

==6556== Cachegrind, a cache and branch-prediction profiler
==6556== Copyright (C) 2002-2010, and GNU GPL'd, by Nicholas Nethercote et al.
==6556== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==6556== Command: ./a.out
==6556== 
--6556-- warning: L3 cache detected but ignored
==6556== 
==6556== I   refs:      50,986,869
==6556== I1  misses:         1,146
==6556== L2i misses:         1,137
==6556== I1  miss rate:       0.00%
==6556== L2i miss rate:       0.00%
==6556== 
==6556== D   refs:      20,232,408  (18,893,241 rd   + 1,339,167 wr)
==6556== D1  misses:       150,194  (   144,869 rd   +     5,325 wr)
==6556== L2d misses:        10,451  (     5,506 rd   +     4,945 wr)
==6556== D1  miss rate:        0.7% (       0.7%     +       0.3%  )
==6556== L2d miss rate:        0.0% (       0.0%     +       0.3%  )
==6556== 
==6556== L2 refs:          151,340  (   146,015 rd   +     5,325 wr)
==6556== L2 misses:         11,588  (     6,643 rd   +     4,945 wr)
==6556== L2 miss rate:         0.0% (       0.0%     +       0.3%  )

According to the documentation, the L1 and the L3 caches should have been used but the output says that L3 cache is being ignored. 根据文档,应该使用L1和L3缓存,但输出表明L3缓存被忽略。 Why is that? 这是为什么?

Also does cachegrind preassume what the L1 and last-level cache sizes are, or does it use the L1 and last-level cache sizes of the CPU it is currently running on? cachegrind也会预先确定L1和最后一级缓存大小是什么,或者它是否使用当前运行的CPU的L1和最后一级缓存大小?

You're running on an intel CPU that cachegrind appears to not have full support for. 你在intel CPU上运行,cachegrind似乎没有完全支持。 They inspect the cpuid flags and determine support based on a huge set of case statements for different processors. 他们检查cpuid标志并根据不同处理器的大量case语句确定支持。

This is from a unofficial copy of the code, but is illustrative - https://github.com/koriakin/valgrind/blob/master/cachegrind/cg-x86-amd64.c : 这是来自代码的非官方副本,但是是说明性的 - https://github.com/koriakin/valgrind/blob/master/cachegrind/cg-x86-amd64.c

/* Intel method is truly wretched.  We have to do an insane indexing into an
 * array of pre-defined configurations for various parts of the memory
 * hierarchy.
 * According to Intel Processor Identification, App Note 485.
 */
static
Int Intel_cache_info(Int level, cache_t* I1c, cache_t* D1c, cache_t* L2c)
{
...
      case 0x22: case 0x23: case 0x25: case 0x29:
      case 0x46: case 0x47: case 0x4a: case 0x4b: case 0x4c: case 0x4d:
      case 0xe2: case 0xe3: case 0xe4: case 0xea: case 0xeb: case 0xec:
          VG_(dmsg)("warning: L3 cache detected but ignored\n");
          break;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 数据可以使用所有的 L2/L3 缓存吗? 如果是这样,为什么 Graviton 3 带宽图在 L2/L3 大小的一半后下降,但只是逐渐下降? - Can all of L2/L3 cache be used by data? If so, why does the Graviton 3 bandwidth plot drop off after half the L2/L3 size, but only gradually? 为什么Perf和Papi为L3缓存引用和未命中提供不同的值? - Why does Perf and Papi give different values for L3 cache references and misses? 为什么我的8M L3缓存不能为大于1M的阵列带来任何好处? - Why does my 8M L3 cache not provide any benefit for arrays larger than 1M? Cachegrind:为什么这么多缓存未命中? - Cachegrind: Why so many cache misses? 多少对象(包含std :: vectors)被加载到L1 / L2 / L3缓存中? - How much of an object (containing std::vectors) is loaded in to the L1/L2/L3 cache? 如何在OSX中测量L1,L2,L3缓存命中和未命中 - How to measure L1, L2, L3 cache hits & misses in OSX 在L3缓存未命中之前,向量中有多少个可迭代对象? - How many objects iterable in a vector before L3 cache misses occur? 如何清除L1,L2和L3缓存? - How to clear L1, L2 and L3 caches? 有多少数据加载到 L2 和 L3 缓存中? - How much data is loaded in to the L2 and L3 caches? 为什么_mm_stream_ps会产生L1 / LL缓存未命中? - Why does _mm_stream_ps produce L1/LL cache misses?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM