简体   繁体   English

与简单地创建线程相比,使用线程池是否有性能优势?

[英]Is there a performance benefit in using a pool of threads over simply creating threads?

I was wondering is there are performance benefits to using a pool of threads over simply creating threads and allowing the OS to queue and schedule them.我想知道与简单地创建线程并允许操作系统对它们进行排队和调度相比,使用线程池是否有性能优势。

Say I have 20 available threads and I have 60 tasks I want to run on these threads, say I have something like;假设我有 20 个可用线程,并且我有 60 个要在这些线程上运行的任务,比如说我有类似的东西;

void someTask() {

  //...performs some task

}

// say std::thread::hardware_concurrency() = 20

std::vector<std::thread> threads;
for (int i = 0; i < 60; i++) {
  threads.push_back(std::thread(someFunc));
}

std::for_each(threads.begin(),threads.end(),[](std::thread& x){x.join();});

Is there a benefit to instead creating a pool with 20 threads and giving each of these another 'task' when a thread becomes free?相反,创建一个具有 20 个线程的池并在线程空闲时为每个线程分配另一个“任务”是否有好处? I assume that there is some overhead in spawning a thread, but are there other benefits to creating a pool for such a problem?我假设生成线程有一些开销,但是为这样的问题创建池还有其他好处吗?

Creating a thread takes typically 75k cycles (~20us).创建一个线程通常需要 75k 周期(~20us)。

Starting said thread could take 200k cycles (~60us).启动所述线程可能需要 200k 周期(~60us)。

Waking up a thread takes about 15k cycles (~5us).唤醒一个线程大约需要 15k 个周期(~5us)。

So you can see that it is worth pre-creating threads and just waking them up instead of creating threads every time.因此,您可以看到值得预先创建线程并唤醒它们而不是每次都创建线程。

#include <iostream>
#include <thread>
#include <cstdint>
#include <mutex>
#include <chrono>
#include <condition_variable>

uint64_t now() {
    return __builtin_ia32_rdtsc();
}

uint64_t t0 = 0;
uint64_t t1 = 0;
uint64_t t2 = 0;
uint64_t t3 = 0;
uint64_t t4 = 0;
double sum01 = 0;
double sum02 = 0;
double sum34 = 0;
uint64_t count = 0;
std::mutex m;
std::condition_variable cv;

void run() {
    t1 = now();
    cv.notify_one();
    std::unique_lock<std::mutex> lk(m);
    cv.wait(lk);
    t4 = now();
}

void create_thread() {
    t0 = now();
    std::thread th( run );
    t2 = now();
    std::this_thread::sleep_for( std::chrono::microseconds(100));
    t3 = now();
    cv.notify_one();
    th.join();
    count++;
    sum01 += (t1-t0);
    sum02 += (t2-t0);
    sum34 += (t4-t3);
}

int main() {
    const uint32_t numloops = 10;
    for ( uint32_t j=0; j<numloops; ++j ) {
        create_thread();
    }
    std::cout << "t01:" << sum01/count << std::endl;
    std::cout << "t02:" << sum02/count << std::endl;
    std::cout << "t34:" << sum34/count << std::endl;
}

Typical result:典型结果:

Program returned: 0
t01:64614.4
t02:54655
t34:15758.4

Source: https://godbolt.org/z/recfjKe8x资料来源: https://godbolt.org/z/recfjKe8x

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM