简体   繁体   中英

Is there a performance benefit in using a pool of threads over simply creating threads?

I was wondering is there are performance benefits to using a pool of threads over simply creating threads and allowing the OS to queue and schedule them.

Say I have 20 available threads and I have 60 tasks I want to run on these threads, say I have something like;

void someTask() {

  //...performs some task

}

// say std::thread::hardware_concurrency() = 20

std::vector<std::thread> threads;
for (int i = 0; i < 60; i++) {
  threads.push_back(std::thread(someFunc));
}

std::for_each(threads.begin(),threads.end(),[](std::thread& x){x.join();});

Is there a benefit to instead creating a pool with 20 threads and giving each of these another 'task' when a thread becomes free? I assume that there is some overhead in spawning a thread, but are there other benefits to creating a pool for such a problem?

Creating a thread takes typically 75k cycles (~20us).

Starting said thread could take 200k cycles (~60us).

Waking up a thread takes about 15k cycles (~5us).

So you can see that it is worth pre-creating threads and just waking them up instead of creating threads every time.

#include <iostream>
#include <thread>
#include <cstdint>
#include <mutex>
#include <chrono>
#include <condition_variable>

uint64_t now() {
    return __builtin_ia32_rdtsc();
}

uint64_t t0 = 0;
uint64_t t1 = 0;
uint64_t t2 = 0;
uint64_t t3 = 0;
uint64_t t4 = 0;
double sum01 = 0;
double sum02 = 0;
double sum34 = 0;
uint64_t count = 0;
std::mutex m;
std::condition_variable cv;

void run() {
    t1 = now();
    cv.notify_one();
    std::unique_lock<std::mutex> lk(m);
    cv.wait(lk);
    t4 = now();
}

void create_thread() {
    t0 = now();
    std::thread th( run );
    t2 = now();
    std::this_thread::sleep_for( std::chrono::microseconds(100));
    t3 = now();
    cv.notify_one();
    th.join();
    count++;
    sum01 += (t1-t0);
    sum02 += (t2-t0);
    sum34 += (t4-t3);
}

int main() {
    const uint32_t numloops = 10;
    for ( uint32_t j=0; j<numloops; ++j ) {
        create_thread();
    }
    std::cout << "t01:" << sum01/count << std::endl;
    std::cout << "t02:" << sum02/count << std::endl;
    std::cout << "t34:" << sum34/count << std::endl;
}

Typical result:

Program returned: 0
t01:64614.4
t02:54655
t34:15758.4

Source: https://godbolt.org/z/recfjKe8x

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM