简体   繁体   English

为什么cachegrind不是完全确定的?

[英]Why isn't cachegrind completely deterministic?

Inspired by SQLite , I'm looking at using valgrind's "cachegrind" tool to do reproducible performance benchmarking. 受SQLite的启发 ,我正在寻找使用valgrind的“cachegrind”工具来进行可重现的性能基准测试。 The numbers it outputs are much more stable than any other method of timing I've found, but they're still not deterministic. 它输出的数字比我发现的任何其他计时方法稳定得多,但它们仍然不具有确定性。 As an example, here's a simple C program: 举个例子,这是一个简单的C程序:

int main() {
  volatile int x;
  while (x < 1000000) {
    x++;
  }
}

If I compile it and run it under cachegrind, I get the following results: 如果我编译它并在cachegrind下运行它,我得到以下结果:

$ gcc -O2 x.c -o x
$ valgrind --tool=cachegrind ./x
==11949== Cachegrind, a cache and branch-prediction profiler
==11949== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.
==11949== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==11949== Command: ./x
==11949==
--11949-- warning: L3 cache found, using its data for the LL simulation.
==11949==
==11949== I   refs:      11,158,333
==11949== I1  misses:         3,565
==11949== LLi misses:         2,611
==11949== I1  miss rate:       0.03%
==11949== LLi miss rate:       0.02%
==11949==
==11949== D   refs:       4,116,700  (3,552,970 rd   + 563,730 wr)
==11949== D1  misses:        21,119  (   19,041 rd   +   2,078 wr)
==11949== LLd misses:         7,487  (    6,148 rd   +   1,339 wr)
==11949== D1  miss rate:        0.5% (      0.5%     +     0.4%  )
==11949== LLd miss rate:        0.2% (      0.2%     +     0.2%  )
==11949==
==11949== LL refs:           24,684  (   22,606 rd   +   2,078 wr)
==11949== LL misses:         10,098  (    8,759 rd   +   1,339 wr)
==11949== LL miss rate:         0.1% (      0.1%     +     0.2%  )
$ valgrind --tool=cachegrind ./x
==11982== Cachegrind, a cache and branch-prediction profiler
==11982== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.
==11982== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==11982== Command: ./x
==11982==
--11982-- warning: L3 cache found, using its data for the LL simulation.
==11982==
==11982== I   refs:      11,159,225
==11982== I1  misses:         3,611
==11982== LLi misses:         2,611
==11982== I1  miss rate:       0.03%
==11982== LLi miss rate:       0.02%
==11982==
==11982== D   refs:       4,117,029  (3,553,176 rd   + 563,853 wr)
==11982== D1  misses:        21,174  (   19,090 rd   +   2,084 wr)
==11982== LLd misses:         7,496  (    6,154 rd   +   1,342 wr)
==11982== D1  miss rate:        0.5% (      0.5%     +     0.4%  )
==11982== LLd miss rate:        0.2% (      0.2%     +     0.2%  )
==11982==
==11982== LL refs:           24,785  (   22,701 rd   +   2,084 wr)
==11982== LL misses:         10,107  (    8,765 rd   +   1,342 wr)
==11982== LL miss rate:         0.1% (      0.1%     +     0.2%  )
$

In this case, "I refs" differs by only 0.008% between the two runs but I still wonder why these are different. 在这种情况下,“I refs”在两次运行之间仅相差0.008%,但我仍然想知道为什么这些不同。 In more complex programs (tens of milliseconds) they can vary by more. 在更复杂的程序(几十毫秒)中,它们可以变化更多。 Is there any way to make the runs completely reproducible? 有没有办法让运行完全可重复?

At the end of a topic in gmane.comp.debugging.valgrind , Nicholas Nethercote (a Mozilla developper working in the Valgrind dev team) says that minor variations are common using Cachegrind (and I can infer that they will not lead to major problems). gmane.comp.debugging.valgrind一个主题的最后 ,Nicholas Nethercote(一个在Valgrind开发团队工作的Mozilla开发人员)说使用Cachegrind会有一些小变化(我可以推断它们不会导致重大问题) 。

Cachegrind's manual mentions that the program is very sensitive. Cachegrind的手册提到该程序非常敏感。 For instance, on Linux, address space randomisation (used to improve security) can be the source of the non-determinism. 例如,在Linux上,地址空间随机化(用于提高安全性)可能是非确定性的来源。

Another thing worth noting is that results are very sensitive. 值得注意的另一件事是结果非常敏感。 Changing the size of the executable being profiled, or the sizes of any of the shared libraries it uses, or even the length of their file names, can perturb the results. 更改正在分析的可执行文件的大小,或者它使用的任何共享库的大小,甚至文件名的长度都会影响结果。 Variations will be small, but don't expect perfectly repeatable results if your program changes at all. 变化会很小,但如果您的程序发生变化,则不会产生完全可重复的结果。

More recent GNU/Linux distributions do address space randomisation, in which identical runs of the same program have their shared libraries loaded at different locations, as a security measure. 最近的GNU / Linux发行版确实解决了空间随机化问题,其中相同程序的相同运行将其共享库加载到不同位置,作为安全措施。 This also perturbs the results. 这也扰乱了结果。

While these factors mean you shouldn't trust the results to be super-accurate, they should be close enough to be useful. 虽然这些因素意味着你不应该相信结果是超精确的,但它们应该足够接近有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM