使用cachegrind和callgrind的不同读写计数

Question

I am doing some experiments with Cachegrind, Callgrind and Gem5. 我正在使用Cachegrind，Callgrind和Gem5做一些实验。 I noticed that a number of accesses were counted as read for cachegrind, as write for callgrind and for both read and write by gem5. 我注意到，gem5将许多访问记录为cachegrind的读取，callgrind的写入以及读取和写入。

Let's take a very simple example: 让我们举一个非常简单的例子：

int main() {
    int i, l;

    for (i = 0; i < 1000; i++) {
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        ... (100 times)
     }
 }

I compile with: 我编译：

gcc ex.c --static -o ex gcc ex.c --static -o ex

So basically, according to the asm file, addl $1, -8(%rbp) is executed 100,000 times. 因此，基本上，根据asm文件， addl $1, -8(%rbp)被执行100,000次。 Since it's both a read and a write, I was expecting 100k read and 100k write. 由于它既是读取又是写入，因此我期望100k的读取和100k的写入。 However, cachegrind only counts them as read and callgrind only as write. 但是，cachegrind仅将它们视为已读，而callgrind仅视为已写入。

 % valgrind --tool=cachegrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15356== Cachegrind, a cache and branch-prediction profiler
==15356== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al.
==15356== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15356== Command: ./ex
==15356== 
--15356-- warning: L3 cache found, using its data for the LL simulation.
==15356== 
==15356== I   refs:      111,535
==15356== I1  misses:        475
==15356== LLi misses:        280
==15356== I1  miss rate:    0.42%
==15356== LLi miss rate:    0.25%
==15356== 
==15356== D   refs:      104,894  (103,791 rd   + 1,103 wr)
==15356== D1  misses:        557  (    414 rd   +   143 wr)
==15356== LLd misses:        172  (     89 rd   +    83 wr)
==15356== D1  miss rate:     0.5% (    0.3%     +  12.9%  )
==15356== LLd miss rate:     0.1% (    0.0%     +   7.5%  )
==15356== 
==15356== LL refs:         1,032  (    889 rd   +   143 wr)
==15356== LL misses:         452  (    369 rd   +    83 wr)
==15356== LL miss rate:      0.2% (    0.1%     +   7.5%  )

- --

 % valgrind --tool=callgrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15376== Callgrind, a call-graph generating cache profiler
==15376== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al.
==15376== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15376== Command: ./ex
==15376== 
--15376-- warning: L3 cache found, using its data for the LL simulation.
==15376== For interactive control, run 'callgrind_control -h'.
==15376== 
==15376== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==15376== Collected : 111532 2777 102117 474 406 151 279 87 85
==15376== 
==15376== I   refs:      111,532
==15376== I1  misses:        474
==15376== LLi misses:        279
==15376== I1  miss rate:    0.42%
==15376== LLi miss rate:    0.25%
==15376== 
==15376== D   refs:      104,894  (2,777 rd + 102,117 wr)
==15376== D1  misses:        557  (  406 rd +     151 wr)
==15376== LLd misses:        172  (   87 rd +      85 wr)
==15376== D1  miss rate:     0.5% ( 14.6%   +     0.1%  )
==15376== LLd miss rate:     0.1% (  3.1%   +     0.0%  )
==15376== 
==15376== LL refs:         1,031  (  880 rd +     151 wr)
==15376== LL misses:         451  (  366 rd +      85 wr)
==15376== LL miss rate:      0.2% (  0.3%   +     0.0%  )

Could someone give me a reasonable explanation? 有人可以给我一个合理的解释吗？ Would I be correct to consider there are in fact ~100k reads and ~100k writes (ie 2 cache accesses for an addl)? 我是否认为实际上有大约10万次读取和大约10万次写入（即，一个addl有2次缓存访问）是否正确？

Answer 1

From cachegrind manual: 5.7.1. 摘自cachegrind手册：5.7.1。 Cache Simulation Specifics 缓存模拟的细节

Instructions that modify a memory location (eg inc and dec) are counted as doing just a read, ie a single data reference. 修改存储器位置的指令（例如inc和dec）被视为仅读取，即单个数据引用。 This may seem strange, but since the write can never cause a miss (the read guarantees the block is in the cache) it's not very interesting. 这可能看起来很奇怪，但是由于写操作永远不会导致未命中（读操作保证该块位于高速缓存中），因此它不是很有趣。
Thus it measures not the number of times the data cache is accessed, but the number of times a data cache miss could occur. 因此，它不度量数据高速缓存被访问的次数，而是度量数据高速缓存未命中的次数。

It would seem that callgrind's cache simulation logic is different from cachegrind. 似乎callgrind的缓存模拟逻辑与cachegrind不同。 I would think that callgrind should produce the same results as cachegrind, so maybe this is a bug? 我认为callgrind应该产生与cachegrind相同的结果，所以也许这是一个错误？

Answer 2

callgrind does not full cache simulation by default. 默认情况下，callgrind不会完全缓存模拟。 see here: http://valgrind.org/docs/manual/cl-manual.html#cl-manual.options.cachesimulation 参见此处： http : //valgrind.org/docs/manual/cl-manual.html#cl-manual.options.cachesimulation

To enable data read access you need to add --cache-sim=yes for callgrind. 要启用数据读取访问，您需要为callgrind添加--cache-sim = yes。 Having said this, why even using callgrind on this code? 话虽如此，为什么还要在此代码上使用callgrind？ There is not a single function call (which is what callgrind is for) 没有一个函数调用（这就是callgrind的作用）

使用cachegrind和callgrind的不同读写计数

问题描述

2 个解决方案

解决方案1
3 2013-05-21 03:33:42

解决方案2
-1 2013-04-22 23:07:44

使用cachegrind和callgrind的不同读写计数

问题描述

2 个解决方案

解决方案1 3 2013-05-21 03:33:42

解决方案2 -1 2013-04-22 23:07:44

解决方案1
3 2013-05-21 03:33:42

解决方案2
-1 2013-04-22 23:07:44