了解Linux perf报告输出

Question

Though I can intuitively get most of the results, I'm having hard time fully understanding the output of the perf report command especially for what concerns the call graph, so I wrote a stupid test to solve this issue of mine once for all. 虽然我可以直观地得到大部分结果，但我很难完全理解perf report命令的输出，特别是对于调用图的问题，所以我写了一个愚蠢的测试，一劳永逸地解决了我的这个问题。

The stupid test 愚蠢的考验

I compiled what follows with: 我编写了以下内容：

gcc -Wall -pedantic -lm perf-test.c -o perf-test

No aggressive optimizations to avoid inlining and such. 没有积极的优化来避免内联等。

#include <math.h>

#define N 10000000UL

#define USELESSNESS(n)                          \
    do {                                        \
        unsigned long i;                        \
        double x = 42;                          \
        for (i = 0; i < (n); i++) x = sin(x);   \
    } while (0)

void baz()
{
    USELESSNESS(N);
}

void bar()
{
    USELESSNESS(2 * N);
    baz();
}

void foo()
{
    USELESSNESS(3 * N);
    bar();
    baz();
}

int main()
{
    foo();
    return 0;
}

Flat profiling 平面剖析

perf record ./perf-test
perf report

With these I get: 有了这些我得到：

  94,44%  perf-test  libm-2.19.so       [.] __sin_sse2
   2,09%  perf-test  perf-test          [.] sin@plt
   1,24%  perf-test  perf-test          [.] foo
   0,85%  perf-test  perf-test          [.] baz
   0,83%  perf-test  perf-test          [.] bar

Which sounds reasonable since the heavy work is actually performed by __sin_sse2 and sin@plt is probably just a wrapper, while the overhead of my functions take into account just the loop, overall: 3*N iterations for foo , 2*N for the other two. 这听起来很合理，因为繁重的工作实际上是由__sin_sse2执行的，而sin@plt可能只是一个包装器，而我的函数的开销只考虑了循环，整体： foo为3*N次迭代，另一次为2*N二。

Hierarchical profiling 分层分析

perf record -g ./perf-test
perf report -G
perf report

Now the overhead columns that I get are two: Children (the output is sorted by this one by default) and Self (the same overhead of the flat profile). 现在我得到的开销列是两个： Children （默认情况下输出由此排序）和Self （平面配置文件的相同开销）。

Here is where I start feeling I miss something: regardless of the fact that I use -G or not I'm unable to explain the hierarchy in terms of "x calls y" or "y is called by x", for example: 这是我开始觉得我错过了一些东西：不管我使用-G的事实，我无法用“x调用y”或“y调用x”来解释层次结构，例如：

without -G ("y is called by x"): 没有-G （“y由x调用”）：
```
 - 94,34% 94,06% perf-test libm-2.19.so [.] __sin_sse2 - __sin_sse2 + 43,67% foo + 41,45% main + 14,88% bar - 37,73% 0,00% perf-test perf-test [.] main main __libc_start_main - 23,41% 1,35% perf-test perf-test [.] foo foo main __libc_start_main - 6,43% 0,83% perf-test perf-test [.] bar bar foo main __libc_start_main - 0,98% 0,98% perf-test perf-test [.] baz - baz + 54,71% foo + 45,29% bar 
```
1. Why __sin_sse2 is called by main (indirectly?), foo and bar but not by baz ? 为什么__sin_sse2由main （间接？）， foo和bar调用，而不是由baz调用？
2. Why functions sometimes have a percent and a hierarchy attached (eg, the last instance of baz ) and sometimes not (eg, the last instance of bar )? 为什么函数有时会附加百分比和层次结构（例如， baz的最后一个实例），有时不会（例如， bar的最后一个实例）？
with -G ("x calls y"): 与-G （“x调用y”）：
```
 - 94,34% 94,06% perf-test libm-2.19.so [.] __sin_sse2 + __sin_sse2 + __libc_start_main + main - 37,73% 0,00% perf-test perf-test [.] main - main + 62,05% foo + 35,73% __sin_sse2 2,23% sin@plt - 23,41% 1,35% perf-test perf-test [.] foo - foo + 64,40% __sin_sse2 + 29,18% bar + 3,98% sin@plt 2,44% baz __libc_start_main main foo 
```
1. How should I interpret the first three entries under __sin_sse2 ? 我应该如何解释__sin_sse2下的前三个条目？
2. main calls foo and that's ok, but why if it calls __sin_sse2 and sin@plt (indirectly?) it does not also call bar and baz ? main调用foo ，这没关系，但为什么如果它调用__sin_sse2和sin@plt （间接？）它也不会调用bar和baz ？
3. Why do __libc_start_main and main appear under foo ? 为什么__libc_start_main和main出现在foo下？ And why foo appears twice? 为什么foo出现两次？

Suspect is that there are two levels of this hierarchy, in which the second actually represents the "x calls y"/"y is called by x" semantics, but I'm tired to guess so I'm asking here. 怀疑是这个层次结构有两个层次，其中第二层实际上代表“x调用y”/“y被x调用”语义，但我很难猜测所以我在这里问。 And the documentation doesn't seem to help. 文档似乎没有帮助。

Sorry for the long post but I hope that all this context may help or act as a reference for someone else too. 对于长篇文章感到抱歉，但我希望所有这些背景可能对其他人有帮助或作为参考。

Answer 1

Alright, well, let's ignore temporarily the difference between caller and callee call-graphs, mostly because when I compare the results between these two options on my machine, I only see effects inside the kernel.kallsyms DSO for reasons I don't understand -- relatively new to this myself. 好吧，让我们暂时忽略调用者和被调用者调用图之间的区别，主要是因为当我在我的机器上比较这两个选项之间的结果时，我只看到kernel.kallsyms DSO中的效果，原因我不明白 - - 我自己比较新。

I found that for your example, it's a little easier to read the whole tree. 我发现，对于你的例子，读整个树更容易一些。 So, using --stdio , let's look at the whole tree for __sin_sse2 : 所以，使用--stdio ，让我们看一下__sin_sse2的整个树：

# Overhead    Command      Shared Object                  Symbol
# ........  .........  .................  ......................
#
    94.72%  perf-test  libm-2.19.so       [.] __sin_sse2
            |
            --- __sin_sse2
               |
               |--44.20%-- foo
               |          |
               |           --100.00%-- main
               |                     __libc_start_main
               |                     _start
               |                     0x0
               |
               |--27.95%-- baz
               |          |
               |          |--51.78%-- bar
               |          |          foo
               |          |          main
               |          |          __libc_start_main
               |          |          _start
               |          |          0x0
               |          |
               |           --48.22%-- foo
               |                     main
               |                     __libc_start_main
               |                     _start
               |                     0x0
               |
                --27.84%-- bar
                          |
                           --100.00%-- foo
                                     main
                                     __libc_start_main
                                     _start
                                     0x0

So, the way I read this is: 44% of the time, sin is called from foo ; 所以，我读这个的方式是：44％的时间， sin从foo调用; 27% of the time it's called from baz , and 27% from bar. 27％的时间是从baz打来的，27％来自酒吧。

The documentation for -g is instructive: -g的文档很有启发性：

 -g [type,min[,limit],order[,key]], --call-graph
       Display call chains using type, min percent threshold, optional print limit and order. type can be either:

       ·   flat: single column, linear exposure of call chains.

       ·   graph: use a graph tree, displaying absolute overhead rates.

       ·   fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object.

               order can be either:
               - callee: callee based call graph.
               - caller: inverted caller based call graph.

               key can be:
               - function: compare on functions
               - address: compare on individual code addresses

               Default: fractal,0.5,callee,function.

The important piece here is that the default is fractal, and in fractal mode, each branch is a new object. 这里的重要部分是默认为分形，在分形模式下，每个分支都是一个新对象。

So, you can see that 50% of the time that baz is called, it's called from bar , and the other 50% it's called from foo . 所以，你可以看到有50％的时间被调用baz ，它是从bar调用的，另外50％是从foo调用的。

This isn't always the most useful measure, so it's instructive to look at the results using -g graph : 这并不总是最有用的度量，因此使用-g graph查看结果是有益的：

94.72%  perf-test  libm-2.19.so       [.] __sin_sse2
        |
        --- __sin_sse2
           |
           |--41.87%-- foo
           |          |
           |           --41.48%-- main
           |                     __libc_start_main
           |                     _start
           |                     0x0
           |
           |--26.48%-- baz
           |          |
           |          |--13.50%-- bar
           |          |          foo
           |          |          main
           |          |          __libc_start_main
           |          |          _start
           |          |          0x0
           |          |
           |           --12.57%-- foo
           |                     main
           |                     __libc_start_main
           |                     _start
           |                     0x0
           |
            --26.38%-- bar
                      |
                       --26.17%-- foo
                                 main
                                 __libc_start_main
                                 _start
                                 0x0

This changes to using absolute percentages, where each percentage of time is reported for that call chain: So foo->bar is 26% of the total ticks (which in turn calls baz ), and foo->baz (direct) is 12% of the total ticks. 这会改为使用绝对百分比，其中报告该调用链的每个百分比时间：因此foo->bar是总滴答数的26％（反过来称为baz ），而foo->baz （直接）为12％总滴答声。

I still have no idea why I don't see any differences between callee and caller graphs though, from the perspective of __sin_sse2 . 从__sin_sse2的角度来看，我仍然不知道为什么我看不出被调用者和调用者图表之间的任何差异。

Update 更新

One thing I did change from your command line is how the callgraphs were gathered. 我从命令行改变的一件事是如何收集调用图。 Linux perf by default uses the frame pointer method of reconstructing callstacks. Linux perf默认使用重构callstacks的帧指针方法。 This can be a problem when the compiler uses -fomit-frame-pointer as a default . 当编译器使用-fomit-frame-pointer作为默认值时，这可能是一个问题。 So I used 所以我用过

perf record --call-graph dwarf ./perf-test

了解Linux perf报告输出

问题描述

The stupid test 愚蠢的考验

Flat profiling 平面剖析

Hierarchical profiling 分层分析

1 个解决方案

解决方案1
7 2015-02-04 15:08:49

Update 更新

了解Linux perf报告输出

问题描述

The stupid test 愚蠢的考验

Flat profiling 平面剖析

Hierarchical profiling 分层分析

1 个解决方案

解决方案1 7 2015-02-04 15:08:49

Update 更新

解决方案1
7 2015-02-04 15:08:49