了解Linux perf报告输出

Question

虽然我可以直观地得到大部分结果，但我很难完全理解perf report命令的输出，特别是对于调用图的问题，所以我写了一个愚蠢的测试，一劳永逸地解决了我的这个问题。

愚蠢的考验

我编写了以下内容：

gcc -Wall -pedantic -lm perf-test.c -o perf-test

没有积极的优化来避免内联等。

#include <math.h>

#define N 10000000UL

#define USELESSNESS(n)                          \
    do {                                        \
        unsigned long i;                        \
        double x = 42;                          \
        for (i = 0; i < (n); i++) x = sin(x);   \
    } while (0)

void baz()
{
    USELESSNESS(N);
}

void bar()
{
    USELESSNESS(2 * N);
    baz();
}

void foo()
{
    USELESSNESS(3 * N);
    bar();
    baz();
}

int main()
{
    foo();
    return 0;
}

平面剖析

perf record ./perf-test
perf report

有了这些我得到：

  94,44%  perf-test  libm-2.19.so       [.] __sin_sse2
   2,09%  perf-test  perf-test          [.] sin@plt
   1,24%  perf-test  perf-test          [.] foo
   0,85%  perf-test  perf-test          [.] baz
   0,83%  perf-test  perf-test          [.] bar

这听起来很合理，因为繁重的工作实际上是由__sin_sse2执行的，而sin@plt可能只是一个包装器，而我的函数的开销只考虑了循环，整体： foo为3*N次迭代，另一次为2*N二。

分层分析

perf record -g ./perf-test
perf report -G
perf report

现在我得到的开销列是两个： Children （默认情况下输出由此排序）和Self （平面配置文件的相同开销）。

这是我开始觉得我错过了一些东西：不管我使用-G的事实，我无法用“x调用y”或“y调用x”来解释层次结构，例如：

没有-G （“y由x调用”）：

 - 94,34% 94,06% perf-test libm-2.19.so [.] __sin_sse2 - __sin_sse2 + 43,67% foo + 41,45% main + 14,88% bar - 37,73% 0,00% perf-test perf-test [.] main main __libc_start_main - 23,41% 1,35% perf-test perf-test [.] foo foo main __libc_start_main - 6,43% 0,83% perf-test perf-test [.] bar bar foo main __libc_start_main - 0,98% 0,98% perf-test perf-test [.] baz - baz + 54,71% foo + 45,29% bar

为什么__sin_sse2由main （间接？）， foo和bar调用，而不是由baz调用？
为什么函数有时会附加百分比和层次结构（例如， baz的最后一个实例），有时不会（例如， bar的最后一个实例）？

与-G （“x调用y”）：

 - 94,34% 94,06% perf-test libm-2.19.so [.] __sin_sse2 + __sin_sse2 + __libc_start_main + main - 37,73% 0,00% perf-test perf-test [.] main - main + 62,05% foo + 35,73% __sin_sse2 2,23% sin@plt - 23,41% 1,35% perf-test perf-test [.] foo - foo + 64,40% __sin_sse2 + 29,18% bar + 3,98% sin@plt 2,44% baz __libc_start_main main foo

我应该如何解释__sin_sse2下的前三个条目？
main调用foo ，这没关系，但为什么如果它调用__sin_sse2和sin@plt （间接？）它也不会调用bar和baz ？
为什么__libc_start_main和main出现在foo下？ 为什么foo出现两次？

怀疑是这个层次结构有两个层次，其中第二层实际上代表“x调用y”/“y被x调用”语义，但我很难猜测所以我在这里问。 文档似乎没有帮助。

对于长篇文章感到抱歉，但我希望所有这些背景可能对其他人有帮助或作为参考。

Answer 1

好吧，让我们暂时忽略调用者和被调用者调用图之间的区别，主要是因为当我在我的机器上比较这两个选项之间的结果时，我只看到kernel.kallsyms DSO中的效果，原因我不明白 - - 我自己比较新。

我发现，对于你的例子，读整个树更容易一些。 所以，使用--stdio ，让我们看一下__sin_sse2的整个树：

# Overhead    Command      Shared Object                  Symbol
# ........  .........  .................  ......................
#
    94.72%  perf-test  libm-2.19.so       [.] __sin_sse2
            |
            --- __sin_sse2
               |
               |--44.20%-- foo
               |          |
               |           --100.00%-- main
               |                     __libc_start_main
               |                     _start
               |                     0x0
               |
               |--27.95%-- baz
               |          |
               |          |--51.78%-- bar
               |          |          foo
               |          |          main
               |          |          __libc_start_main
               |          |          _start
               |          |          0x0
               |          |
               |           --48.22%-- foo
               |                     main
               |                     __libc_start_main
               |                     _start
               |                     0x0
               |
                --27.84%-- bar
                          |
                           --100.00%-- foo
                                     main
                                     __libc_start_main
                                     _start
                                     0x0

所以，我读这个的方式是：44％的时间， sin从foo调用; 27％的时间是从baz打来的，27％来自酒吧。

-g的文档很有启发性：

 -g [type,min[,limit],order[,key]], --call-graph
       Display call chains using type, min percent threshold, optional print limit and order. type can be either:

       ·   flat: single column, linear exposure of call chains.

       ·   graph: use a graph tree, displaying absolute overhead rates.

       ·   fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object.

               order can be either:
               - callee: callee based call graph.
               - caller: inverted caller based call graph.

               key can be:
               - function: compare on functions
               - address: compare on individual code addresses

               Default: fractal,0.5,callee,function.

这里的重要部分是默认为分形，在分形模式下，每个分支都是一个新对象。

所以，你可以看到有50％的时间被调用baz ，它是从bar调用的，另外50％是从foo调用的。

这并不总是最有用的度量，因此使用-g graph查看结果是有益的：

94.72%  perf-test  libm-2.19.so       [.] __sin_sse2
        |
        --- __sin_sse2
           |
           |--41.87%-- foo
           |          |
           |           --41.48%-- main
           |                     __libc_start_main
           |                     _start
           |                     0x0
           |
           |--26.48%-- baz
           |          |
           |          |--13.50%-- bar
           |          |          foo
           |          |          main
           |          |          __libc_start_main
           |          |          _start
           |          |          0x0
           |          |
           |           --12.57%-- foo
           |                     main
           |                     __libc_start_main
           |                     _start
           |                     0x0
           |
            --26.38%-- bar
                      |
                       --26.17%-- foo
                                 main
                                 __libc_start_main
                                 _start
                                 0x0

这会改为使用绝对百分比，其中报告该调用链的每个百分比时间：因此foo->bar是总滴答数的26％（反过来称为baz ），而foo->baz （直接）为12％总滴答声。

从__sin_sse2的角度来看，我仍然不知道为什么我看不出被调用者和调用者图表之间的任何差异。

更新

我从命令行改变的一件事是如何收集调用图。 Linux perf默认使用重构callstacks的帧指针方法。 当编译器使用-fomit-frame-pointer作为默认值时，这可能是一个问题。 所以我用过

perf record --call-graph dwarf ./perf-test

了解Linux perf报告输出

问题描述

愚蠢的考验

平面剖析

分层分析

1 个解决方案

解决方案1
7 2015-02-04 15:08:49

更新

了解Linux perf报告输出

问题描述

愚蠢的考验

平面剖析

分层分析

1 个解决方案

解决方案1 7 2015-02-04 15:08:49

更新

解决方案1
7 2015-02-04 15:08:49