唤醒线程非常耗时

Question

#ifndef THREADPOOL_H
#define THREADPOOL_H
#include <iostream>
#include <deque>
#include <functional>
#include <thread>
#include <condition_variable>
#include <mutex>
#include <atomic>
#include <vector>

//thread pool
class ThreadPool
{
public:
    ThreadPool(unsigned int n = std::thread::hardware_concurrency())
        : busy()
        , processed()
        , stop()
    {
        for (unsigned int i=0; i<n; ++i)
            workers.emplace_back(std::bind(&ThreadPool::thread_proc, this));
    }

    template<class F> void enqueue(F&& f)
    {
        std::unique_lock<std::mutex> lock(queue_mutex);
        tasks.emplace_back(std::forward<F>(f));
        cv_task.notify_one();
    }

    void waitFinished()
    {
        std::unique_lock<std::mutex> lock(queue_mutex);
        cv_finished.wait(lock, [this](){ return tasks.empty() && (busy == 0); });
    }

    ~ThreadPool()
    {
        // set stop-condition
        std::unique_lock<std::mutex> latch(queue_mutex);
        stop = true;
        cv_task.notify_all();
        latch.unlock();

        // all threads terminate, then we're done.
        for (auto& t : workers)
            t.join();
    }

    unsigned int getProcessed() const { return processed; }

private:
    std::vector< std::thread > workers;
    std::deque< std::function<void()> > tasks;
    std::mutex queue_mutex;
    std::condition_variable cv_task;
    std::condition_variable cv_finished;
    unsigned int busy;
    std::atomic_uint processed;
    bool stop;

    void thread_proc()
    {
        while (true)
        {
            std::unique_lock<std::mutex> latch(queue_mutex);
            cv_task.wait(latch, [this](){ return stop || !tasks.empty(); });
            if (!tasks.empty())
            {
                // got work. set busy.
                ++busy;

                // pull from queue
                auto fn = tasks.front();
                tasks.pop_front();

                // release lock. run async
                latch.unlock();

                // run function outside context
                fn();
                ++processed;

                latch.lock();
                --busy;
                cv_finished.notify_one();
            }
            else if (stop)
            {
                break;
            }
        }
    }
};
#endif // THREADPOOL_H

I have the above thread pool implementation using a latch. 我有使用闩锁的上述线程池实现。 However, every time I add a task through the enqueue call, the overhead is quite large, it takes about 100 micro seconds. 但是，每次我通过入队调用添加任务时，开销都非常大，大约需要100微秒。

How can I improve the performance of the threadpool? 如何提高线程池的性能？

Answer 1

Your code looks fine. 您的代码看起来不错。 The comments above in your question about compiling with release optimizations on are probably correct and all you need to do. 您的问题中上面关于使用发行版优化进行编译的注释可能是正确的，您需要做的所有事情。

Disclaimer: Always measure code first with appropriate tools to identify where the bottlenecks are before attempting to improve it's performance. 免责声明：在尝试提高性能之前，请始终先使用适当的工具来测量代码，以识别瓶颈所在。 Otherwise, you might not get the improvements you seek. 否则，您可能无法获得想要的改进。

But a couple of potential micro-optimizations I see are this. 但是我看到了一些潜在的微优化 。

Change this in your thread_proc function 在您的thread_proc函数中更改它

    while (true)
    {
        std::unique_lock<std::mutex> latch(queue_mutex);
        cv_task.wait(latch, [this](){ return stop || !tasks.empty(); });
        if (!tasks.empty())

To this: 对此：

    std::unique_lock<std::mutex> latch(queue_mutex);
    while (!stop)
    {
        cv_task.wait(latch, [this](){ return stop || !tasks.empty(); });
        while (!tasks.empty() && !stop)

And then remove the else if (stop) block and the end of the function. 然后删除else if (stop)块和函数的结尾。

The main impact this has is that it avoids the extra "unlock" and "lock" on queue_mutex as a result of latch going out of scope on each iteration of the while loop. 其主要影响是，它避免了由于while循环的每次迭代中latch超出范围而导致的对queue_mutex的额外“解锁”和“锁定”。 The changing of if (!tasks.empty()) to while (!tasks.empty()) might save a cycle or two as well by letting the currently executing thread which has the quantum keep the lock and try to deque the next work item. 如果将if (!tasks.empty())更改为while (!tasks.empty())可以通过让具有量子的当前执行线程保持锁定并尝试出队下一个工作来节省一两个周期。项目。

<opinion> One final thing. <意见>最后一件事。 I'm always of the opinion that the notify should be outside the lock. 我总是认为notify应该在锁之外。 That way, there's no lock contention when the other thread is woken up by the thread that just updated the queue. 这样，当刚更新队列的线程唤醒另一个线程时，就没有锁争用。 But I've never actually measured this assumption, so take it with a grain of salt: 但是我从来没有实际测量过这个假设，因此请一of而就：

template<class F> void enqueue(F&& f)
{
    queue_mutex.lock();
        tasks.emplace_back(std::forward<F>(f));
    queue_mutex.unlock();
    cv_task.notify_one();
}

唤醒线程非常耗时

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-03-17 18:50:26

唤醒线程非常耗时

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-03-17 18:50:26

解决方案1
1 已采纳 2019-03-17 18:50:26