简体   繁体   English

如何解读perf的报告

[英]How to interpret the report of perf

I'm learning how to use the tool perf to profile my c++ project.我正在学习如何使用工具perf来分析我的 C++ 项目。 Here is my code:这是我的代码:

#include <iostream>
#include <thread>
#include <mutex>
#include <vector>


std::mutex mtx;
long long_val = 0;

void do_something(long &val)
{
    std::unique_lock<std::mutex> lck(mtx);
    for(int j=0; j<1000; ++j)
        val++;
}


void thread_func()
{
    for(int i=0; i<1000000L; ++i)
    {
        do_something(long_val);
    }
}


int main(int argc, char* argv[])
{
    std::vector<std::unique_ptr<std::thread>> threads;
    for(int i=0; i<100; ++i)
    {
        threads.push_back(std::move(std::unique_ptr<std::thread>(new std::thread(thread_func))));
    }
    for(int i=0; i<100; ++i)
    {
        threads[i]->join();
    }
    threads.clear();
    std::cout << long_val << std::endl;
    return 0;
}

To compile it, I run g++ -std=c++11 main.cpp -lpthread -g and then I get the executable file named a.out .为了编译它,我运行g++ -std=c++11 main.cpp -lpthread -g然后我得到名为a.out的可执行文件。

Then I run perf record --call-graph dwarf -- ./a.out and wait for 10 seconds, then I press Ctrl+c to interrupt the ./a.out because it needs too much time to execute.然后我运行perf record --call-graph dwarf -- ./a.out并等待 10 秒,然后我按Ctrl+c中断./a.out因为它需要太多时间来执行。

Lastly, I run perf report -g graph --no-children and here is the output:最后,我运行perf report -g graph --no-children ,这是输出:

在此处输入图片说明

My goal is to find which part of the code is the heaviest.我的目标是找出代码的哪一部分最重。 So it seems that this output could tell me do_something is the heaviest part(46.25%).所以看起来这个输出可以告诉我do_something是最重的部分(46.25%)。 But when I enter into do_something , I can not understand what it is: std::_Bind_simple , std::thread::_Impl etc.但是当我进入do_something ,我无法理解它是什么: std::_Bind_simplestd::thread::_Impl等。

So how to get more useful information from the output of perf report ?那么如何从perf report的输出中获取更多有用的信息呢? Or we can't get more except the fact that do_something is the heaviest?或者除了do_something是最重的这一事实之外,我们无法获得更多?

With the help of @Peter Cordes, I pose this answer.在@Peter Cordes 的帮助下,我提出了这个答案。 If you have something more useful, please feel free to pose your answers.如果您有更有用的东西,请随时提出您的答案。

You forgot to enable optimization at all when you compiled, so all the little functions that should normally inline away are actually getting called.您在编译时完全忘记启用优化,因此所有通常应该内联的小函数实际上都被调用了。 Add -O3 or at least -O2 to your g++ command line.将 -O3 或至少 -O2 添加到您的 g++ 命令行。 Optionally also profile-guided optimization if you really want gcc to do a good job on hot loops.如果您真的希望 gcc 在热循环上做得很好,也可以选择配置文件引导的优化。

After adding -O3 , the output of perf report becomes:添加-O3perf report的输出变为:

在此处输入图片说明

Now we can get something useful from futex_wake and futex_wait_setup as we should know that mutex in C++11 is implemented by futex of Linux.现在我们可以从futex_wakefutex_wait_setup得到一些有用的futex_wake ,因为我们应该知道 C++11 中的mutex是由 Linux 的futex实现的。 So the result is that mutex is the hotspot in this code.所以结果是mutex是这段代码中的热点。

The issue here is that your mutexes are waiting on each other forcing your program to hit the scheduler often.这里的问题是您的互斥体正在相互等待,迫使您的程序经常命中调度程序。

You would get better performance if you used fewer threads.如果您使用更少的线程,您将获得更好的性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM