简体   繁体   English

cachegrind计数不能反映实际性能

[英]cachegrind counts do not reflect real performance

Two versions of the same algorithm yield different total instruction fetch counts and cycle estimations under valgrind/cachegrind. 同一算法的两个版本在valgrind / cachegrind下产生不同的总指令获取计数和周期估计。 The difference is about 25%. 差异约为25%。 Process timing, however, is very similar (it is actually shorter for the cachegrind-slow version): 但是,处理时间非常相似(对于cachegrind-slow版本,实际上更短):

  • version 1: 版本1:

     Ir: 146,328,018,245 CEst: 152,553,736,055 timing: 17.93 s 
  • version 2: 版本2:

     Ir: 185,221,836,610 CEst: 197,531,381,950 timing: 17.53 s 

Is this behaviour expected? 这是预期的行为吗? How can I learn more about why version 1 is slower? 我如何了解有关版本1为何速度较慢的更多信息?

I discovered that the inconsistency is due to the different compiler options used for the cachegrind runs and for the timing runs. 我发现不一致是由于用于cachegrind运行和计时运行的编译器选项不同。 In particular, I had disabled function inlining for the cachegrind runs (so that I could get meaningful per-function counts). 特别是,我为cachegrind运行禁用了函数内联(这样我就可以获得有意义的每个函数计数)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM