運行固定數量的線程

Question

使用c++17的新標准，我想知道是否存在一種好的方法來以固定數量的線程啟動進程，直到完成一批作業。

您能告訴我如何實現此代碼的所需功能：

std::vector<std::future<std::string>> futureStore;
const int batchSize             = 1000;
const int maxNumParallelThreads = 10;
int threadsTerminated           = 0;

while(threadsTerminated < batchSize)
{
    const int& threadsRunning = futureStore.size();
    while(threadsRunning < maxNumParallelThreads)
    {
        futureStore.emplace_back(std::async(someFunction));
    }
    for(std::future<std::string>& readyFuture: std::when_any(futureStore.begin(), futureStore.end()))
    {
        auto retVal = readyFuture.get(); 
        // (possibly do something with the ret val)
        threadsTerminated++;
    }
}

我讀到，以前曾經有一個std::when_any函數，但這確實使它進入了std功能。

當前標准庫中是否對此功能提供任何支持（不一定支持std::future -s）？ 有沒有一種方法可以輕松實現它，還是我必須解決類似問題？

Answer 1

在我看來，這似乎不是理想的方法：

您的所有主線程所做的工作就是等待其他線程完成操作，並輪詢您的未來結果。 幾乎以某種方式浪費了這個線程...
我不知道std :: async在多大程度上以任何合適的方式重用了線程的基礎架構，因此您冒着每次創建全新線程的風險……（除此之外，您可能根本不會創建任何線程，請參閱如果您未明確指定std::launch::async ，請在此處。

我個人更喜歡另一種方法：

一次創建您要使用的所有線程。
讓每個線程運行一個循環，重復調用someFunction（），直到達到所需的任務數量為止。

該實現可能類似於以下示例：

const int BatchSize = 20;
int tasksStarted = 0;
std::mutex mutex;
std::vector<std::string> results;

std::string someFunction()
{
    puts("worker started"); fflush(stdout);
    sleep(2);
    puts("worker done"); fflush(stdout);
    return "";
}

void runner()
{
    {
        std::lock_guard<std::mutex> lk(mutex);
        if(tasksStarted >= BatchSize)
            return;
        ++tasksStarted;
    }
    for(;;)
    {
        std::string s = someFunction();
        {
            std::lock_guard<std::mutex> lk(mutex);
            results.push_back(s);
            if(tasksStarted >= BatchSize)
                break;
            ++tasksStarted;
        }
    }
}

int main(int argc, char* argv[])
{
    const int MaxNumParallelThreads = 4;

    std::thread threads[MaxNumParallelThreads - 1]; // main thread is one, too!
    for(int i = 0; i < MaxNumParallelThreads - 1; ++i)
    {
        threads[i] = std::thread(&runner);
    }
    runner();

    for(int i = 0; i < MaxNumParallelThreads - 1; ++i)
    {
        threads[i].join();
    }

    // use results...

    return 0;
}

這樣，您不必重新創建每個線程，而只需繼續直到所有任務完成即可。

如果這些任務並非如上例一樣，您可以使用純虛函數創建基類Task （例如“ execute”或“ operator（）”），並創建具有所需實現的子類（並保存所有必要的數據））。

然后，您可以將實例放置到std :: vector或std :: list中（好吧，我們不會迭代，這里的list可能合適...）作為指針（否則，您將得到類型擦除！），並讓每個線程完成前一項任務后，刪除其中一項任務（不要忘了防止競爭！）並執行它。 一旦沒有更多的任務，返回...

Answer 2

如果您不關心確切的線程數，則最簡單的解決方案是：

std::vector<std::future<std::string>> futureStore(
    batchSize
);

std::generate(futureStore.begin(), futureStore.end(), [](){return std::async(someTask);});


for(auto& future : futureStore) {
    std::string value = future.get();
    doWork(value);
}

根據我的經驗，在產生一定數量的線程之后， std::async將重用線程。 它不會產生1000個線程。 另外，使用線程池時，性能提升不會很大（如果有的話）。 我過去曾進行過測量，但總體運行時間幾乎相同。

我現在使用線程池的唯一原因是避免在計算循環中創建線程的延遲。 如果您有時間限制，那么第一次使用std :: async時，您可能會錯過最后期限，因為它將在第一次調用時創建線程。

這些應用程序都有一個很好的線程池庫。 在這里看看： https : //github.com/vit-vit/ctpl

#include <ctpl.h>

const unsigned int numberOfThreads = 10;
const unsigned int batchSize = 1000;

ctpl::thread_pool pool(batchSize /* two threads in the pool */);
std::vector<std::future<std::string>> futureStore(
    batchSize
);

std::generate(futureStore.begin(), futureStore.end(), [](){ return pool.push(someTask);});

for(auto& future : futureStore) {
    std::string value = future.get();
    doWork(value);
}

運行固定數量的線程

問題描述

2 個解決方案

解決方案1
2 2017-06-13 09:11:07

解決方案2
1 2017-06-13 09:32:41

運行固定數量的線程

問題描述

2 個解決方案

解決方案1 2 2017-06-13 09:11:07

解決方案2 1 2017-06-13 09:32:41

解決方案1
2 2017-06-13 09:11:07

解決方案2
1 2017-06-13 09:32:41