简体   繁体   中英

Why does my Wait method hang indefinitely?

I am using this snippet of code to handle a pool of threads. After calling RunTask a number of times (where the task in question is a method that sleeps for 5 seconds), I then call Wait() in an attempt to wait for all the threads to finish. The code hangs in this Wait() mehhod. I suspect its my management of the condition variable, but I can't figure it out.

class lthThreadHandler {
public:
    lthThreadHandler(int pool_size)
    {
        mAvailable = pool_size;
        mRunning = true;
        for (int i = 0; i < pool_size; ++i)
            mThreads.create_thread(boost::bind(&lthThreadHandler::pool_main, this));
    }

    template<typename Task>
    void RunTask(Task task)
    {
        boost::unique_lock<boost::mutex> lock(mMutex);

        // If no threads are available, then return.
        if (0 == mAvailable)
            return;

        // Decrement count, indicating thread is no longer available.
        --mAvailable;

        // Set task and signal condition variable so that a worker thread will
        // wake up andl use the task.
        mTasks.push(boost::function<void()>(task));
        mCondition.notify_one();
    }

    void Wait() {
        try {
            mThreads.join_all();
        }
        // Suppress all exceptions.
        catch (...) {
        }
    }

    /// @brief Entry point for pool threads.
    void pool_main() {
        while (mRunning) {
            // Wait on condition variable while the task is empty and the pool is
            // still running.
            boost::unique_lock<boost::mutex> lock(mMutex);
            while (mTasks.empty() && mRunning) {
                mCondition.wait(lock);
            }
            // If pool is no longer running, break out.
            if (!mRunning)
                break;

            // Copy task locally and remove from the queue.  This is done within
            // its own scope so that the task object is destructed immediately
            // after running the task.  This is useful in the event that the
            // function contains shared_ptr arguments bound via bind.
            {
                boost::function<void()> task = mTasks.front();
                mTasks.pop();

                lock.unlock();

                // Run the task.
                try {
                    task();
                }
                // Suppress all exceptions.
                catch (...) {
                }
            }

            // Task has finished, so increment count of available threads.
            lock.lock();
            ++mAvailable;
        } // while mRunning
    }
private:
    std::queue<boost::function<void()> > mTasks;
    boost::thread_group mThreads;
    std::size_t mAvailable;
    boost::mutex mMutex;
    boost::condition_variable mCondition;
    bool mRunning;
};

This code snippet come from another SO answer: Thread pool using boost asio

Wait hangs indefinitely because the threads all run an infinite loop as long as you don't clear mRunning . Even if you did it might change your pet into a bag of popcorn because of the data race.


So. Basically you butchered someone's perfectly fine taskqueue/threadpool implementation. Mostly by renaming things idly¹ and then

  • removing the crucial destructor
  • adding a Wait() that just joins threads without doing anything to cause these threads to finish.
  • adding race conditions on the mRunning flag

Even if you put the original destructor logic into Wait() keep in mind you'll actually have a race condition where there is no guarantee that the posted tasks will run before the (main) thread resets the mRunning flag.

What Now?

I suggest you don't meddle with the code.

  • If there is something you need it to do, ask a constructive question like "How can I make sure that XXX" instead of "Why does my broken code not YYY"
  • If you want to learn ask about the original implementation, or about a single isolated change

Alternatives

I have a few thread similar pools that you might find easier to understand on [SO] so you could compare:

I answered a similar question once before (in response to the fact that you're trying to join_all from a member function instead of the destructor:

That answer could be very helpful to you explaining why that kind of function is tricky.


¹ pool_main , really? Thread manager, really? All the names were originally better chosen.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM