如何解讀perf的報告

Question

我正在學習如何使用工具perf來分析我的 C++ 項目。 這是我的代碼：

#include <iostream>
#include <thread>
#include <mutex>
#include <vector>


std::mutex mtx;
long long_val = 0;

void do_something(long &val)
{
    std::unique_lock<std::mutex> lck(mtx);
    for(int j=0; j<1000; ++j)
        val++;
}


void thread_func()
{
    for(int i=0; i<1000000L; ++i)
    {
        do_something(long_val);
    }
}


int main(int argc, char* argv[])
{
    std::vector<std::unique_ptr<std::thread>> threads;
    for(int i=0; i<100; ++i)
    {
        threads.push_back(std::move(std::unique_ptr<std::thread>(new std::thread(thread_func))));
    }
    for(int i=0; i<100; ++i)
    {
        threads[i]->join();
    }
    threads.clear();
    std::cout << long_val << std::endl;
    return 0;
}

為了編譯它，我運行g++ -std=c++11 main.cpp -lpthread -g然后我得到名為a.out的可執行文件。

然后我運行perf record --call-graph dwarf -- ./a.out並等待 10 秒，然后我按Ctrl+c中斷./a.out因為它需要太多時間來執行。

最后，我運行perf report -g graph --no-children ，這是輸出：

我的目標是找出代碼的哪一部分最重。 所以看起來這個輸出可以告訴我do_something是最重的部分（46.25%）。 但是當我進入do_something ，我無法理解它是什么： std::_Bind_simple 、 std::thread::_Impl等。

那么如何從perf report的輸出中獲取更多有用的信息呢？ 或者除了do_something是最重的這一事實之外，我們無法獲得更多？

Answer 1

在@Peter Cordes 的幫助下，我提出了這個答案。 如果您有更有用的東西，請隨時提出您的答案。

您在編譯時完全忘記啟用優化，因此所有通常應該內聯的小函數實際上都被調用了。 將 -O3 或至少 -O2 添加到您的 g++ 命令行。 如果您真的希望 gcc 在熱循環上做得很好，也可以選擇配置文件引導的優化。

添加-O3 ， perf report的輸出變為：

現在我們可以從futex_wake和futex_wait_setup得到一些有用的futex_wake ，因為我們應該知道 C++11 中的mutex是由 Linux 的futex實現的。 所以結果是mutex是這段代碼中的熱點。

Answer 2

這里的問題是您的互斥體正在相互等待，迫使您的程序經常命中調度程序。

如果您使用更少的線程，您將獲得更好的性能。

如何解讀perf的報告

問題描述

2 個解決方案

解決方案1
1 已采納 2019-07-31 06:55:27

解決方案2
0 2019-07-31 06:41:31

如何解讀perf的報告

問題描述

2 個解決方案

解決方案1 1 已采納 2019-07-31 06:55:27

解決方案2 0 2019-07-31 06:41:31

解決方案1
1 已采納 2019-07-31 06:55:27

解決方案2
0 2019-07-31 06:41:31