简体   繁体   English

在 C++ 中分别测量每个线程上花费的 CPU 时间

[英]Measure CPU time spent on each thread separately in C++

I know that this question sounds like an easy question and a duplicate of former ones, in which boost.timer and the chrono facility of C++11 are given as answers.我知道这个问题听起来像一个简单的问题,并且与以前的问题重复,其中 boost.timer 和 C++11 的计时工具作为答案给出。

But, what I have in mind is a bit different and I found no answer to it either on StackOverflow or elsewhere:但是,我的想法有点不同,我在 StackOverflow 或其他地方都没有找到答案:

In my (C++11) program on Ubuntu Linux, I start several threads with std::async and the std::future mechanism.在我关于 Ubuntu Linux 的 (C++11) 程序中,我使用 std::async 和 std::future 机制启动了几个线程。

Inside every thread I measure CPU-Time with boost.timer().在每个线程中,我使用 boost.timer() 测量 CPU-Time。 If I start only one thread I get a CPU time of (in my example) ~0.39 sec and an equal WC time of ~0.39 sec.如果我只启动一个线程,我的 CPU 时间(在我的示例中)约为 0.39 秒,WC 时间相等,约为 0.39 秒。

If I start several threads I get a longer WC time for each, say 0.8 sec for 16 threads and now the CPU-time for each is about 6.4 sec, that is 8 * 0.8 sec (I have a quad-core Xeon CPU).如果我启动多个线程,每个线程的 WC 时间会更长,例如 16 个线程为 0.8 秒,现在每个线程的 CPU 时间约为 6.4 秒,即 8 * 0.8 秒(我有一个四核 Xeon CPU)。

So the CPU-Time of each thread is seemingly multiplied by the (number of CPU cores) * 2.所以每个线程的 CPU-Time 似乎乘以(CPU 核心数)* 2。

Of course(?) I would like to see a CPU-time near 0.39 sec for each thread, as this is probably still the time the thread uses the CPU for its purposes.当然(?)我希望每个线程的 CPU 时间接近 0.39 秒,因为这可能仍然是线程将 CPU 用于其目的的时间。 The longer CPU time shown (multiplied with the "CPU number factor") is not so much of help in gauging the true CPU consumption of each thread separately.显示的较长 CPU 时间(乘以“CPU 数量因子”)对于分别衡量每个线程的真实 CPU 消耗没有太大帮助。

For illustration I append my test program and its output, first for one thread, then for 16 threads.为了说明我的 append 我的测试程序及其 output,首先是一个线程,然后是 16 个线程。

So my question is: What can I do, which library, function or programming technique can I use, to get the true CPU usage of each thread which should not change much with the number of threads started?所以我的问题是:我能做些什么,我可以使用哪个库、function 或编程技术来获得每个线程的真实 CPU 使用率,它不应该随着线程数的启动而发生太大变化?

#include <iostream>
#include <fstream>

#include <vector>
#include <cmath>

#include <future>
#include <mutex>
#include <chrono>

#include <boost/timer/timer.hpp>

std::mutex mtx;


class XTimer
{
public:
    XTimer() {};

    void start();
    void stop();

    double cpu_time();
    double boost_cpu_time();
    double wc_time();

    std::chrono::time_point<std::chrono::system_clock> timestamp_wc;
    std::chrono::time_point<std::chrono::steady_clock> timestamp_cpu;

    boost::timer::cpu_timer timer_cpu;

    double wc_time_val;
    double cpu_time_val;

    double boost_cpu_time_val;

};

void XTimer::start()
{
    timestamp_wc = std::chrono::system_clock::now();
    timestamp_cpu = std::chrono::steady_clock::now();

    timer_cpu.start();

    cpu_time_val = 0;
    wc_time_val = 0;

    boost_cpu_time_val = 0;


}

void XTimer::stop()
{
    const auto ns_wc = std::chrono::system_clock::now() - timestamp_wc;
    const auto ns_cpu = std::chrono::steady_clock::now() - timestamp_cpu;

    auto elapsed_times(timer_cpu.elapsed());

    auto cpu_elapsed(elapsed_times.system + elapsed_times.user);

    //std::cout << "boost: cpu elapsed = " << cpu_elapsed << std::endl;

    wc_time_val = double(ns_wc.count())/1e9;
    cpu_time_val = double(ns_cpu.count())/1e9;

    boost_cpu_time_val = double(cpu_elapsed)/1e9;

}


double XTimer::cpu_time()
{
    return cpu_time_val;
}

double XTimer::boost_cpu_time()
{
    return boost_cpu_time_val;
}

double XTimer::wc_time()
{
    return wc_time_val;
}




template<class T>
int wait_for_all(std::vector<std::future<T>> & fuvec)
{
    std::vector<T> res;

    for(auto & fu: fuvec) {
        res.push_back(fu.get());
    }
    return res.size();

}


int test_thread(int a)
{
    const int N = 10000000;

    double x = 0;

    XTimer tt;

    do {
        std::lock_guard<std::mutex> lck {mtx}; 
        std::cout << "start thread: " << a << std::endl;
    } while (0);

    tt.start(); 

    for(int i = 0; i < N; ++i) {

        if (i % 10000 == 0) {
            //std::cout << (char((int('A') + a)));
        }

        x += sin(i);
    }

    tt.stop();

    do {
        std::lock_guard<std::mutex> lck {mtx};

        std::cout << "end thread: " << a << std::endl;
        std::cout << "boost cpu = " << tt.boost_cpu_time() << " wc = " << tt.wc_time() << std::endl;
    } while (0);

    return 0;
}


int test_threads_start(int num_threads)
{

    std::vector<std::future<int>> fivec;

    XTimer tt;

    tt.start();

    for(int i = 0; i < num_threads; ++i) {
        fivec.push_back(std::async(test_thread, i));
    }

    int sz = wait_for_all(fivec);

    tt.stop();

    std::cout << std::endl << std::endl;

    std::cout << "all threads finished: total wc time = " << tt.wc_time() << std::endl;
    std::cout << "all threads finished: total boost cpu time = " << tt.boost_cpu_time() << std::endl;

}


int main(int argc, char** argv)
{
    const int num_threads_default = 1;
    int num_threads = num_threads_default;

    //boost::timer::auto_cpu_timer ac;

    if (argc > 1) {
        num_threads = atoi(argv[1]);
    }

    std::cout << "starting " << num_threads << " threads." << std::endl;

    test_threads_start(num_threads);

    std::cout << "end." << std::endl;

    return 0;
}

I can be compiled by我可以编译

g++ -o testit testit.cpp -L/usr/lib/x86_64-linux-gnu -pthread -lboost_timer -lboost_system -lboost_thread 

Sample output with 1 thread样品 output 带 1 个线程

starting 1 threads.
start thread: 0
end thread: 0
boost cpu = 0.37 wc = 0.374107


all threads finished: total wc time = 0.374374
all threads finished: total boost cpu time = 0.37

Sample output with 16 threads样品 output 16 线程

starting 16 threads.
start thread: 0
start thread: 1
start thread: 2
start thread: 3
start thread: 4
start thread: 10
start thread: 5
start thread: 7
start thread: 6
start thread: 11
start thread: 8
start thread: 9
start thread: 13
start thread: 12
start thread: 14
start thread: 15
end thread: 1
boost cpu = 4.67 wc = 0.588818
end thread: 2
boost cpu = 5.29 wc = 0.66638
end thread: 0
boost cpu = 5.72 wc = 0.7206
end thread: 13
boost cpu = 5.82 wc = 0.728717
end thread: 11
boost cpu = 6.18 wc = 0.774979
end thread: 12
boost cpu = 6.17 wc = 0.773298
end thread: 6
boost cpu = 6.32 wc = 0.793143
end thread: 15
boost cpu = 6.12 wc = 0.767049
end thread: 4
boost cpu = 6.7 wc = 0.843377
end thread: 14
boost cpu = 6.74 wc = 0.84842
end thread: 3
boost cpu = 6.91 wc = 0.874065
end thread: 9
boost cpu = 6.83 wc = 0.86342
end thread: 5
boost cpu = 7 wc = 0.896873
end thread: 7
boost cpu = 7.05 wc = 0.917324
end thread: 10
boost cpu = 7.11 wc = 0.930335
end thread: 8
boost cpu = 7.03 wc = 0.940374


all threads finished: total wc time = 0.957748
all threads finished: total boost cpu time = 7.14
end.

Documentation of boost::timer does not mention anything about per thread measurements. boost::timer 的文档没有提到任何关于每个线程测量的内容。 Fortunately boost::chrono contains thread_clock which gives per thread CPU usage on platforms which support it.幸运的是 boost::chrono 包含thread_clock ,它在支持它的平台上提供每个线程的 CPU 使用率。 It uses the same interface as the std::chrono clocks and measures thread wall clock.它使用与 std::chrono 时钟相同的接口并测量线程挂钟。

After adding following lines to your example code:在示例代码中添加以下行后:

// Includes section
#include <boost/chrono.hpp>

// XTimer
boost::chrono::thread_clock::time_point timestamp_thread_wc;
double thread_wc_time_val;

// XTimer::start()
timestamp_thread_wc = boost::chrono::thread_clock::now();

// XTimer::stop()
const auto ns_thread_wc = boost::chrono::thread_clock::now() - timestamp_thread_wc;
thread_wc_time_val = double(ns_thread_wc.count())/1e9;

// test_thread() just after for loop
sleep(1);

// test_thread() in bottom do -> while(0) loop
std::cout << "thread cpu = " << tt.thread_wc_time_val << std::endl;

and compiling with additional -lboost_chrono option I get:并使用附加-lboost_chrono选项进行编译,我得到:

starting 1 threads.
start thread: 0
end thread: 0
boost cpu = 0.16 wc = 1.16715
thread cpu = 0.166943


all threads finished: total wc time = 1.16754
all threads finished: total boost cpu time = 0.16
end.

and:和:

starting 2 threads.
start thread: 0
start thread: 1
end thread: 1
boost cpu = 0.28 wc = 1.14168
thread cpu = 0.141524
end thread: 0
boost cpu = 0.28 wc = 1.14417
thread cpu = 0.14401


all threads finished: total wc time = 1.14442
all threads finished: total boost cpu time = 0.28
end.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM