[英]Different read and write count using cachegrind and callgrind
I am doing some experiments with Cachegrind, Callgrind and Gem5. 我正在使用Cachegrind,Callgrind和Gem5做一些实验。 I noticed that a number of accesses were counted as read for cachegrind, as write for callgrind and for both read and write by gem5. 我注意到,gem5将许多访问记录为cachegrind的读取,callgrind的写入以及读取和写入。
Let's take a very simple example: 让我们举一个非常简单的例子:
int main() {
int i, l;
for (i = 0; i < 1000; i++) {
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
... (100 times)
}
}
I compile with: 我编译:
gcc ex.c --static -o ex gcc ex.c --static -o ex
So basically, according to the asm file, addl $1, -8(%rbp)
is executed 100,000 times. 因此,基本上,根据asm文件, addl $1, -8(%rbp)
被执行100,000次。 Since it's both a read and a write, I was expecting 100k read and 100k write. 由于它既是读取又是写入,因此我期望100k的读取和100k的写入。 However, cachegrind only counts them as read and callgrind only as write. 但是,cachegrind仅将它们视为已读,而callgrind仅视为已写入。
% valgrind --tool=cachegrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15356== Cachegrind, a cache and branch-prediction profiler
==15356== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al.
==15356== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15356== Command: ./ex
==15356==
--15356-- warning: L3 cache found, using its data for the LL simulation.
==15356==
==15356== I refs: 111,535
==15356== I1 misses: 475
==15356== LLi misses: 280
==15356== I1 miss rate: 0.42%
==15356== LLi miss rate: 0.25%
==15356==
==15356== D refs: 104,894 (103,791 rd + 1,103 wr)
==15356== D1 misses: 557 ( 414 rd + 143 wr)
==15356== LLd misses: 172 ( 89 rd + 83 wr)
==15356== D1 miss rate: 0.5% ( 0.3% + 12.9% )
==15356== LLd miss rate: 0.1% ( 0.0% + 7.5% )
==15356==
==15356== LL refs: 1,032 ( 889 rd + 143 wr)
==15356== LL misses: 452 ( 369 rd + 83 wr)
==15356== LL miss rate: 0.2% ( 0.1% + 7.5% )
- --
% valgrind --tool=callgrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15376== Callgrind, a call-graph generating cache profiler
==15376== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al.
==15376== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15376== Command: ./ex
==15376==
--15376-- warning: L3 cache found, using its data for the LL simulation.
==15376== For interactive control, run 'callgrind_control -h'.
==15376==
==15376== Events : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==15376== Collected : 111532 2777 102117 474 406 151 279 87 85
==15376==
==15376== I refs: 111,532
==15376== I1 misses: 474
==15376== LLi misses: 279
==15376== I1 miss rate: 0.42%
==15376== LLi miss rate: 0.25%
==15376==
==15376== D refs: 104,894 (2,777 rd + 102,117 wr)
==15376== D1 misses: 557 ( 406 rd + 151 wr)
==15376== LLd misses: 172 ( 87 rd + 85 wr)
==15376== D1 miss rate: 0.5% ( 14.6% + 0.1% )
==15376== LLd miss rate: 0.1% ( 3.1% + 0.0% )
==15376==
==15376== LL refs: 1,031 ( 880 rd + 151 wr)
==15376== LL misses: 451 ( 366 rd + 85 wr)
==15376== LL miss rate: 0.2% ( 0.3% + 0.0% )
Could someone give me a reasonable explanation? 有人可以给我一个合理的解释吗? Would I be correct to consider there are in fact ~100k reads and ~100k writes (ie 2 cache accesses for an addl)? 我是否认为实际上有大约10万次读取和大约10万次写入(即,一个addl有2次缓存访问)是否正确?
From cachegrind manual: 5.7.1. 摘自cachegrind手册:5.7.1。 Cache Simulation Specifics 缓存模拟的细节
Instructions that modify a memory location (eg inc and dec) are counted as doing just a read, ie a single data reference. 修改存储器位置的指令(例如inc和dec)被视为仅读取,即单个数据引用。 This may seem strange, but since the write can never cause a miss (the read guarantees the block is in the cache) it's not very interesting. 这可能看起来很奇怪,但是由于写操作永远不会导致未命中(读操作保证该块位于高速缓存中),因此它不是很有趣。
Thus it measures not the number of times the data cache is accessed, but the number of times a data cache miss could occur. 因此,它不度量数据高速缓存被访问的次数,而是度量数据高速缓存未命中的次数。
It would seem that callgrind's cache simulation logic is different from cachegrind. 似乎callgrind的缓存模拟逻辑与cachegrind不同。 I would think that callgrind should produce the same results as cachegrind, so maybe this is a bug? 我认为callgrind应该产生与cachegrind相同的结果,所以也许这是一个错误?
callgrind does not full cache simulation by default. 默认情况下,callgrind不会完全缓存模拟。 see here: http://valgrind.org/docs/manual/cl-manual.html#cl-manual.options.cachesimulation 参见此处: http : //valgrind.org/docs/manual/cl-manual.html#cl-manual.options.cachesimulation
To enable data read access you need to add --cache-sim=yes for callgrind. 要启用数据读取访问,您需要为callgrind添加--cache-sim = yes。 Having said this, why even using callgrind on this code? 话虽如此,为什么还要在此代码上使用callgrind? There is not a single function call (which is what callgrind is for) 没有一个函数调用(这就是callgrind的作用)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.