简体繁体 English

C++ Callgrind 中的自我时间到底是什么？

[英]What exactly is self time in C++ Callgrind?

原文 2020-02-05 22:24:28 1 1 c++/ valgrind/ self/ callgrind

I am programming in C++ (on Linux) and I have recently started to use Valgrind/Callgrind to optimise my code.我正在用 C++（在 Linux 上）编程，最近我开始使用 Valgrind/Callgrind 来优化我的代码。 After reading a couple tutorials it seems that focusing on functions with highest 'self' cost is a good idea.在阅读了几个教程后，似乎专注于“自我”成本最高的功能是一个好主意。

I found two functions with high self cost (they are both called >1M times and have >10% self cost each, relatively to the entire program execution time).我发现了两个自我成本很高的函数（它们都被调用了 > 1M 次，并且相对于整个程序执行时间而言，每个都有 > 10% 的自我成本）。 In kcachegrind it shows:在 kcachegrind 它显示：

Callgrind however does not tell me which part of the function make up for that self cost, making it difficult to optimise the code.然而，Callgrind 并没有告诉我函数的哪一部分弥补了自我成本，这使得优化代码变得困难。 What exactly is self cost and how can I attempt to reduce it?究竟什么是自我成本，我该如何尝试减少它？

My understanding/guess is that self cost includes reading/writing data, cache misses, basic maths operations, copying things in stack (including function arguments), etc. How do I know which one it is before I can address it?我的理解/猜测是自我成本包括读取/写入数据、缓存未命中、基本数学运算、在堆栈中复制内容（包括函数参数）等。在我解决它之前我如何知道它是哪一个？

Thanks谢谢

1 个解决方案

There are two ways that Callgrind/Kcachegrind can represent times. Callgrind/Kcachegrind 可以通过两种方式表示时间。

% Relative. ％相对的。 This is the default, and all times are represented as a percentage of the total time.这是默认设置，所有时间都表示为总时间的百分比。
Absolute.绝对。 This is a count of the "Cycle Estimation".这是“周期估计”的计数。 This is based on various "events" like instruction read, data cache miss etc. By default callgrind will only count instruction reads - you will need to add the option --cache-sim=yes for cache simulation and --branch-sim=yes for branch predictor simulation.这是基于各种“事件”，如指令读取、数据缓存未命中等。默认情况下，callgrind 只会计算指令读取 - 您需要添加选项--cache-sim=yes用于缓存模拟和--branch-sim=yes分支预测器模拟。 Be aware that Valgrind only has simple cache simulation and a rudimentary branch predictor.请注意，Valgrind 只有简单的缓存模拟和基本的分支预测器。

"Self" is the time spent in each function (not counting any child functions). “Self”是在每个函数中花费的时间（不包括任何子函数）。 "Inclusive" is the time spent in a function and all child functions that it calls, transitively. “包含”是在函数及其调用的所有子函数中花费的时间，传递性。

If you want to see a breakdown of the time spent in a function, you need to compile your application with debug information.如果您想查看在函数中花费的时间细分，您需要使用调试信息编译您的应用程序。 Then after running you application under Callgrind and opening the output file in Kcachegrind, you can look at the "Source Code" tab in the top right pane.然后在 Callgrind 下运行您的应用程序并在 Kcachegrind 中打开输出文件后，您可以查看右上角窗格中的“源代码”选项卡。 This should give an indication of the time on each line of the function.这应该给出函数每一行的时间指示。