cachegrind计数不能反映实际性能

Question

Two versions of the same algorithm yield different total instruction fetch counts and cycle estimations under valgrind/cachegrind. 同一算法的两个版本在valgrind / cachegrind下产生不同的总指令获取计数和周期估计。 The difference is about 25%. 差异约为25％。 Process timing, however, is very similar (it is actually shorter for the cachegrind-slow version): 但是，处理时间非常相似（对于cachegrind-slow版本，实际上更短）：

version 1: 版本1：

 Ir: 146,328,018,245 CEst: 152,553,736,055 timing: 17.93 s

version 2: 版本2：

 Ir: 185,221,836,610 CEst: 197,531,381,950 timing: 17.53 s

Is this behaviour expected? 这是预期的行为吗？ How can I learn more about why version 1 is slower? 我如何了解有关版本1为何速度较慢的更多信息？

Answer 1

I discovered that the inconsistency is due to the different compiler options used for the cachegrind runs and for the timing runs. 我发现不一致是由于用于cachegrind运行和计时运行的编译器选项不同。 In particular, I had disabled function inlining for the cachegrind runs (so that I could get meaningful per-function counts). 特别是，我为cachegrind运行禁用了函数内联（这样我就可以获得有意义的每个函数计数）。

cachegrind计数不能反映实际性能

问题描述

1 个解决方案

解决方案1
0 已采纳 2012-10-18 12:46:54

cachegrind计数不能反映实际性能

问题描述

1 个解决方案

解决方案1 0 已采纳 2012-10-18 12:46:54

解决方案1
0 已采纳 2012-10-18 12:46:54