运行固定数量的线程

Question

With the new standards of c++17 I wonder if there is a good way to start a process with a fixed number of threads until a batch of jobs are finished. 使用c++17的新标准，我想知道是否存在一种好的方法来以固定数量的线程启动进程，直到完成一批作业。

Can you tell me how I can achieve the desired functionality of this code: 您能告诉我如何实现此代码的所需功能：

std::vector<std::future<std::string>> futureStore;
const int batchSize             = 1000;
const int maxNumParallelThreads = 10;
int threadsTerminated           = 0;

while(threadsTerminated < batchSize)
{
    const int& threadsRunning = futureStore.size();
    while(threadsRunning < maxNumParallelThreads)
    {
        futureStore.emplace_back(std::async(someFunction));
    }
    for(std::future<std::string>& readyFuture: std::when_any(futureStore.begin(), futureStore.end()))
    {
        auto retVal = readyFuture.get(); 
        // (possibly do something with the ret val)
        threadsTerminated++;
    }
}

I read, that there used to be an std::when_any function, but it was a feature that did make it getting into the std features. 我读到，以前曾经有一个std::when_any函数，但这确实使它进入了std功能。

Is there any support for this functionality (not necessarily for std::future -s) in the current standard libraries? 当前标准库中是否对此功能提供任何支持（不一定支持std::future -s）？ Is there a way to easily implement it, or do I have to resolve to something like this ? 有没有一种方法可以轻松实现它，还是我必须解决类似问题？

Answer 1

This does not seem to me to be the ideal approach: 在我看来，这似乎不是理想的方法：

All your main thread does is waiting for your other threads finishing, polling the results of your future. 您的所有主线程所做的工作就是等待其他线程完成操作，并轮询您的未来结果。 Almost wasting this thread somehow... 几乎以某种方式浪费了这个线程...
I don't know in how far std::async re-uses the threads' infrastructures in any suitable way, so you risk creating entirely new threads each time... (apart from that you might not create any threads at all, see here , if you do not specify std::launch::async explicitly. 我不知道std :: async在多大程度上以任何合适的方式重用了线程的基础架构，因此您冒着每次创建全新线程的风险……（除此之外，您可能根本不会创建任何线程，请参阅如果您未明确指定std::launch::async ，请在此处。

I personally would prefer another approach: 我个人更喜欢另一种方法：

Create all the threads you want to use at once. 一次创建您要使用的所有线程。
Let each thread run a loop, repeatedly calling someFunction(), until you have reached the number of desired tasks. 让每个线程运行一个循环，重复调用someFunction（），直到达到所需的任务数量为止。

The implementation might look similar to this example: 该实现可能类似于以下示例：

const int BatchSize = 20;
int tasksStarted = 0;
std::mutex mutex;
std::vector<std::string> results;

std::string someFunction()
{
    puts("worker started"); fflush(stdout);
    sleep(2);
    puts("worker done"); fflush(stdout);
    return "";
}

void runner()
{
    {
        std::lock_guard<std::mutex> lk(mutex);
        if(tasksStarted >= BatchSize)
            return;
        ++tasksStarted;
    }
    for(;;)
    {
        std::string s = someFunction();
        {
            std::lock_guard<std::mutex> lk(mutex);
            results.push_back(s);
            if(tasksStarted >= BatchSize)
                break;
            ++tasksStarted;
        }
    }
}

int main(int argc, char* argv[])
{
    const int MaxNumParallelThreads = 4;

    std::thread threads[MaxNumParallelThreads - 1]; // main thread is one, too!
    for(int i = 0; i < MaxNumParallelThreads - 1; ++i)
    {
        threads[i] = std::thread(&runner);
    }
    runner();

    for(int i = 0; i < MaxNumParallelThreads - 1; ++i)
    {
        threads[i].join();
    }

    // use results...

    return 0;
}

This way, you do not recreate each thread newly, but just continue until all tasks are done. 这样，您不必重新创建每个线程，而只需继续直到所有任务完成即可。

If these tasks are not all all alike as in above example, you might create a base class Task with a pure virtual function (eg "execute" or "operator ()") and create subclasses with the implementation required (and holding any necessary data). 如果这些任务并非如上例一样，您可以使用纯虚函数创建基类Task （例如“ execute”或“ operator（）”），并创建具有所需实现的子类（并保存所有必要的数据））。

You could then place the instances into a std::vector or std::list (well, we won't iterate, list might be appropriate here...) as pointers (otherwise, you get type erasure!) and let each thread remove one of the tasks when it has finished its previous one (do not forget to protect against race conditions!) and execute it. 然后，您可以将实例放置到std :: vector或std :: list中（好吧，我们不会迭代，这里的list可能合适...）作为指针（否则，您将得到类型擦除！），并让每个线程完成前一项任务后，删除其中一项任务（不要忘了防止竞争！）并执行它。 As soon as no more tasks are left, return... 一旦没有更多的任务，返回...

Answer 2

If you dont care about the exact number of threads, the simplest solution would be: 如果您不关心确切的线程数，则最简单的解决方案是：

std::vector<std::future<std::string>> futureStore(
    batchSize
);

std::generate(futureStore.begin(), futureStore.end(), [](){return std::async(someTask);});


for(auto& future : futureStore) {
    std::string value = future.get();
    doWork(value);
}

From my experience, std::async will reuse the threads, after a certain amount of threads is spawend. 根据我的经验，在产生一定数量的线程之后， std::async将重用线程。 It will not spawn 1000 threads. 它不会产生1000个线程。 Also, you will not gain much of a performance boost (if any), when using a threadpool. 另外，使用线程池时，性能提升不会很大（如果有的话）。 I did measurements in the past, and the overall runtime was nearly identical. 我过去曾进行过测量，但总体运行时间几乎相同。

The only reason, I use threadpools now, is to avoid the delay for creating threads in the computation loop. 我现在使用线程池的唯一原因是避免在计算循环中创建线程的延迟。 If you have timing constraints, you may miss deadlines, when using std::async for the first time, since it will create the threads on the first calls. 如果您有时间限制，那么第一次使用std :: async时，您可能会错过最后期限，因为它将在第一次调用时创建线程。

There is a good thread pool library for these applications. 这些应用程序都有一个很好的线程池库。 Have a look here: https://github.com/vit-vit/ctpl 在这里看看： https : //github.com/vit-vit/ctpl

#include <ctpl.h>

const unsigned int numberOfThreads = 10;
const unsigned int batchSize = 1000;

ctpl::thread_pool pool(batchSize /* two threads in the pool */);
std::vector<std::future<std::string>> futureStore(
    batchSize
);

std::generate(futureStore.begin(), futureStore.end(), [](){ return pool.push(someTask);});

for(auto& future : futureStore) {
    std::string value = future.get();
    doWork(value);
}

运行固定数量的线程

问题描述

2 个解决方案

解决方案1
2 2017-06-13 09:11:07

解决方案2
1 2017-06-13 09:32:41

运行固定数量的线程

问题描述

2 个解决方案

解决方案1 2 2017-06-13 09:11:07

解决方案2 1 2017-06-13 09:32:41

解决方案1
2 2017-06-13 09:11:07

解决方案2
1 2017-06-13 09:32:41