简体   繁体   中英

Race condition between terminating worker threads and main thread

I am having an issue with terminating worker threads from the main thread. So far each method I tried either leads to a race condition or dead lock.

The worker threads are stored in a inner class inside a class called ThreadPool, ThreadPool maintains a vector of these WorkerThreads using unique_ptr.

Here is the header for my ThreadPool:

class ThreadPool
{
public:
typedef void (*pFunc)(const wpath&, const Args&, Global::mFile_t&, std::mutex&, std::mutex&);       // function to point to
private:

    class WorkerThread
    {
    private:
        ThreadPool* const _thisPool;        // reference enclosing class

        // pointers to arguments
        wpath _pPath;               // member argument that will be modifyable to running thread
        Args * _pArgs;
        Global::mFile_t * _pMap;

        // flags for thread management
        bool _terminate;                    // terminate thread
        bool _busy;                         // is thread busy?
        bool _isRunning;

        // thread management members

        std::mutex              _threadMtx;
        std::condition_variable _threadCond;
        std::thread             _thisThread;

        // exception ptr
        std::exception_ptr _ex;

        // private copy constructor
        WorkerThread(const WorkerThread&): _thisPool(nullptr) {}
    public:
        WorkerThread(ThreadPool&, Args&, Global::mFile_t&);
        ~WorkerThread();

        void setPath(const wpath);          // sets a new task
        void terminate();                   // calls terminate on thread
        bool busy() const;                  // returns whether thread is busy doing task
        bool isRunning() const;             // returns whether thread is still running
        void join();                        // thread join wrapper
        std::exception_ptr exception() const;

        // actual worker thread running tasks
        void thisWorkerThread();
    };

    // thread specific information
    DWORD _numProcs;                        // number of processors on system
    unsigned _numThreads;                   // number of viable threads
    std::vector<std::unique_ptr<WorkerThread>> _vThreads;   // stores thread pointers - workaround for no move constructor in WorkerThread
    pFunc _task;                            // the task threads will call

    // synchronization members
    unsigned _barrierLimit;                 // limit before barrier goes down
    std::mutex _barrierMtx;                 // mutex for barrier
    std::condition_variable _barrierCond;   // condition for barrier
    std::mutex _coutMtx;

public:
    // argument mutex
    std::mutex matchesMap_mtx;
    std::mutex coutMatch_mtx;

    ThreadPool(pFunc f);

    // wake a thread and pass it a new parameter to work on
    void callThread(const wpath&);

    // barrier synchronization
    void synchronizeStartingThreads();

    // starts and synchronizes all threads in a sleep state
    void startThreads(Args&, Global::mFile_t&);

    // terminate threads
    void terminateThreads();

private:
};

So far the real issue I am having is that when calling terminateThreads() from main thread causes dead lock or race condition.

When I set my _terminate flag to true, there is a chance that the main will already exit scope and destruct all mutexes before the thread has had a chance to wake up and terminate. In fact I have gotten this crash quite a few times (console window displays: mutex destroyed while busy)

If I add a thread.join() after I notify_all() the thread, there is a chance the thread will terminate before the join occurs, causing an infinite dead lock, as joining to a terminated thread suspends the program indefinitely.

If I detach - same issue as above, but causes program crash

If I instead use a while(WorkerThread.isRunning()) Sleep(0); The program may crash because the main thread may exit before the WorkerThread reaches that last closing brace.

I am not sure what else to do to stop halt the main until all worker threads have terminated safely. Also, even with try-catch in thread and main, no exceptions are being caught. (everything I have tried leads to program crash)

What can I do to halt the main thread until worker threads have finished?

Here are the implementations of the primary functions:

Terminate Individual worker thread

void ThreadPool::WorkerThread::terminate()
{
    _terminate = true;
    _threadCond.notify_all();
    _thisThread.join();
}

The actual ThreadLoop

void ThreadPool::WorkerThread::thisWorkerThread()
{
    _thisPool->synchronizeStartingThreads();

    try
    {
        while (!_terminate)
        {
            {
                _thisPool->_coutMtx.lock();
                std::cout << std::this_thread::get_id() << " Sleeping..." << std::endl;
                _thisPool->_coutMtx.unlock();
                _busy = false;
                std::unique_lock<std::mutex> lock(_threadMtx);
                _threadCond.wait(lock);
            }
            _thisPool->_coutMtx.lock();
            std::cout << std::this_thread::get_id() << " Awake..." << std::endl;
            _thisPool->_coutMtx.unlock();
            if(_terminate)
                break;

            _thisPool->_task(_pPath, *_pArgs, *_pMap, _thisPool->coutMatch_mtx, _thisPool->matchesMap_mtx);

            _thisPool->_coutMtx.lock();
            std::cout << std::this_thread::get_id() << " Finished Task..." << std::endl;
            _thisPool->_coutMtx.unlock();

        }
        _thisPool->_coutMtx.lock();
        std::cout << std::this_thread::get_id() << " Terminating" << std::endl;
        _thisPool->_coutMtx.unlock();   
    }
    catch (const std::exception&)
    {
        _ex = std::current_exception();
    }
    _isRunning = false;
}

Terminate All Worker Threads

void ThreadPool::terminateThreads()
{
    for (std::vector<std::unique_ptr<WorkerThread>>::iterator it = _vThreads.begin(); it != _vThreads.end(); ++it)
    {
        it->get()->terminate();
        //it->get()->_thisThread.detach();

        // if thread threw an exception, rethrow it in main
        if (it->get()->exception() != nullptr)
            std::rethrow_exception(it->get()->exception());
    }
}

and lastly, the function that is calling the thread pool (the scan function is running on main)

// scans a path recursively for all files of selected extension type, calls thread to parse file
unsigned int Functions::Scan(wpath path, const Args& args, ThreadPool& pool)
{
    wrecursive_directory_iterator d(path), e;
    unsigned int filesFound = 0;
    while ( d != e )
    {
        if (args.verbose())
            std::wcout << L"Grepping: " << d->path().string() << std::endl;

        for (Args::ext_T::const_iterator it = args.extension().cbegin(); it != args.extension().cend(); ++it)
        {
            if (extension(d->path()) == *it)
            {
                ++filesFound;
                pool.callThread(d->path());
            }
        }
        ++d;
    }

    std::cout << "Scan Function: Calling TerminateThreads() " << std::endl;
    pool.terminateThreads();
    std::cout << "Scan Function: Called TerminateThreads() " << std::endl;
    return filesFound;
}

Ill repeat the question again: What can I do to halt the main thread until worker threads have finished?

I don't get the issue with thread termination and join.

Joining threads is all about waiting until the given thread has terminated, so it's exaclty what you want to do. If the thread has finished execution already, join will just return immediately.

So you'll just want to join each thread during the terminate call as you already do in your code.

Note: currently you immediately rethrow any exception if a thread you just terminated has an active exception_ptr . That might lead to unjoined threads. You'll have to keep that in mind when handling those exceptions

Update: after looking at your code, I see a potential bug: std::condition_variable::wait() can return when a spurious wakeup occurs. If that is the case, you will work again on the path that was worked on the last time, leading to wrong results. You should have a flag for new work that is set if new work has been added, and that _threadCond.wait(lock) line should be in a loop that checks for the flag and _terminate . Not sure if that one will fix your problem, though.

The problem was two fold:

synchronizeStartingThreads() would sometimes have 1 or 2 threads blocked, waiting for the okay to go ahead (a problem in the while (some_condition) barrierCond.wait(lock). The condition would sometimes never evaluate to true. removing the while loop fixed this blocking issue.

The second issue was the potential for a worker thread to enter the _threadMtx, and notify_all was called just before they entered the _threadCond.wait(), since notify was already called, the thread would wait forever.

ie.

{
    // terminate() is called
    std::unique_lock<std::mutex> lock(_threadMtx);
    // _threadCond.notify_all() is called here
    _busy = false;
    _threadCond.wait(lock);
    // thread is blocked forever
}

surprisingly, locking this mutex in terminate() did not stop this from happening.

This was solved by adding a timeout of 30ms to the _threadCond.wait()

Also, a check was added before the starting of task to make sure the same task wasn't being processed again.

The new code now looks like this:

thisWorkerThread

_threadCond.wait_for(lock, std::chrono::milliseconds(30));  // hold the lock a max of 30ms

// after the lock, and the termination check

if(_busy)
        {
            Global::mFile_t rMap = _thisPool->_task(_pPath, *_pArgs, _thisPool->coutMatch_mtx);
            _workerMap.element.insert(rMap.element.begin(), rMap.element.end());
        }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM