简体   繁体   English

终止工作线程和主线程之间的竞争状态

[英]Race condition between terminating worker threads and main thread

I am having an issue with terminating worker threads from the main thread. 我在从主线程终止工作线程时遇到问题。 So far each method I tried either leads to a race condition or dead lock. 到目前为止,我尝试的每种方法都会导致争用状况或死锁。

The worker threads are stored in a inner class inside a class called ThreadPool, ThreadPool maintains a vector of these WorkerThreads using unique_ptr. 工作线程存储在名为ThreadPool的类的内部类中,ThreadPool使用unique_ptr维护这些WorkerThreads的向量。

Here is the header for my ThreadPool: 这是我的ThreadPool的标题:

class ThreadPool
{
public:
typedef void (*pFunc)(const wpath&, const Args&, Global::mFile_t&, std::mutex&, std::mutex&);       // function to point to
private:

    class WorkerThread
    {
    private:
        ThreadPool* const _thisPool;        // reference enclosing class

        // pointers to arguments
        wpath _pPath;               // member argument that will be modifyable to running thread
        Args * _pArgs;
        Global::mFile_t * _pMap;

        // flags for thread management
        bool _terminate;                    // terminate thread
        bool _busy;                         // is thread busy?
        bool _isRunning;

        // thread management members

        std::mutex              _threadMtx;
        std::condition_variable _threadCond;
        std::thread             _thisThread;

        // exception ptr
        std::exception_ptr _ex;

        // private copy constructor
        WorkerThread(const WorkerThread&): _thisPool(nullptr) {}
    public:
        WorkerThread(ThreadPool&, Args&, Global::mFile_t&);
        ~WorkerThread();

        void setPath(const wpath);          // sets a new task
        void terminate();                   // calls terminate on thread
        bool busy() const;                  // returns whether thread is busy doing task
        bool isRunning() const;             // returns whether thread is still running
        void join();                        // thread join wrapper
        std::exception_ptr exception() const;

        // actual worker thread running tasks
        void thisWorkerThread();
    };

    // thread specific information
    DWORD _numProcs;                        // number of processors on system
    unsigned _numThreads;                   // number of viable threads
    std::vector<std::unique_ptr<WorkerThread>> _vThreads;   // stores thread pointers - workaround for no move constructor in WorkerThread
    pFunc _task;                            // the task threads will call

    // synchronization members
    unsigned _barrierLimit;                 // limit before barrier goes down
    std::mutex _barrierMtx;                 // mutex for barrier
    std::condition_variable _barrierCond;   // condition for barrier
    std::mutex _coutMtx;

public:
    // argument mutex
    std::mutex matchesMap_mtx;
    std::mutex coutMatch_mtx;

    ThreadPool(pFunc f);

    // wake a thread and pass it a new parameter to work on
    void callThread(const wpath&);

    // barrier synchronization
    void synchronizeStartingThreads();

    // starts and synchronizes all threads in a sleep state
    void startThreads(Args&, Global::mFile_t&);

    // terminate threads
    void terminateThreads();

private:
};

So far the real issue I am having is that when calling terminateThreads() from main thread causes dead lock or race condition. 到目前为止,我真正遇到的问题是从主线程调用TerminateThreads()时会导致死锁或竞争状态。

When I set my _terminate flag to true, there is a chance that the main will already exit scope and destruct all mutexes before the thread has had a chance to wake up and terminate. 当我将_terminate标志设置为true时,在线程有机会唤醒并终止之前,main可能已经退出作用域并破坏了所有互斥对象。 In fact I have gotten this crash quite a few times (console window displays: mutex destroyed while busy) 实际上,我已经多次崩溃(控制台窗口显示:互斥锁在忙碌时被破坏)

If I add a thread.join() after I notify_all() the thread, there is a chance the thread will terminate before the join occurs, causing an infinite dead lock, as joining to a terminated thread suspends the program indefinitely. 如果我在notify_all()线程之后添加thread.join(),则线程有可能在连接发生之前终止,从而导致无限死锁,因为与终止线程的连接将无限期挂起程序。

If I detach - same issue as above, but causes program crash 如果我分离-与上述相同,但导致程序崩溃

If I instead use a while(WorkerThread.isRunning()) Sleep(0); 如果我改为使用while(WorkerThread.isRunning())Sleep(0); The program may crash because the main thread may exit before the WorkerThread reaches that last closing brace. 程序可能会崩溃,因为主线程可能在WorkerThread到达最后一个关闭括号之前退出。

I am not sure what else to do to stop halt the main until all worker threads have terminated safely. 在所有工作线程安全终止之前,我不确定还可以采取什么措施来停止主线程。 Also, even with try-catch in thread and main, no exceptions are being caught. 同样,即使在线程和main中使用try-catch,也不会捕获任何异常。 (everything I have tried leads to program crash) (我尝试过的所有方法都会导致程序崩溃)

What can I do to halt the main thread until worker threads have finished? 在工作线程完成之前,我该怎么做才能停止主线程?

Here are the implementations of the primary functions: 以下是主要功能的实现:

Terminate Individual worker thread 终止单个工作线程

void ThreadPool::WorkerThread::terminate()
{
    _terminate = true;
    _threadCond.notify_all();
    _thisThread.join();
}

The actual ThreadLoop 实际的ThreadLoop

void ThreadPool::WorkerThread::thisWorkerThread()
{
    _thisPool->synchronizeStartingThreads();

    try
    {
        while (!_terminate)
        {
            {
                _thisPool->_coutMtx.lock();
                std::cout << std::this_thread::get_id() << " Sleeping..." << std::endl;
                _thisPool->_coutMtx.unlock();
                _busy = false;
                std::unique_lock<std::mutex> lock(_threadMtx);
                _threadCond.wait(lock);
            }
            _thisPool->_coutMtx.lock();
            std::cout << std::this_thread::get_id() << " Awake..." << std::endl;
            _thisPool->_coutMtx.unlock();
            if(_terminate)
                break;

            _thisPool->_task(_pPath, *_pArgs, *_pMap, _thisPool->coutMatch_mtx, _thisPool->matchesMap_mtx);

            _thisPool->_coutMtx.lock();
            std::cout << std::this_thread::get_id() << " Finished Task..." << std::endl;
            _thisPool->_coutMtx.unlock();

        }
        _thisPool->_coutMtx.lock();
        std::cout << std::this_thread::get_id() << " Terminating" << std::endl;
        _thisPool->_coutMtx.unlock();   
    }
    catch (const std::exception&)
    {
        _ex = std::current_exception();
    }
    _isRunning = false;
}

Terminate All Worker Threads 终止所有工作线程

void ThreadPool::terminateThreads()
{
    for (std::vector<std::unique_ptr<WorkerThread>>::iterator it = _vThreads.begin(); it != _vThreads.end(); ++it)
    {
        it->get()->terminate();
        //it->get()->_thisThread.detach();

        // if thread threw an exception, rethrow it in main
        if (it->get()->exception() != nullptr)
            std::rethrow_exception(it->get()->exception());
    }
}

and lastly, the function that is calling the thread pool (the scan function is running on main) 最后,正在调用线程池的函数(扫描函数在main上运行)

// scans a path recursively for all files of selected extension type, calls thread to parse file
unsigned int Functions::Scan(wpath path, const Args& args, ThreadPool& pool)
{
    wrecursive_directory_iterator d(path), e;
    unsigned int filesFound = 0;
    while ( d != e )
    {
        if (args.verbose())
            std::wcout << L"Grepping: " << d->path().string() << std::endl;

        for (Args::ext_T::const_iterator it = args.extension().cbegin(); it != args.extension().cend(); ++it)
        {
            if (extension(d->path()) == *it)
            {
                ++filesFound;
                pool.callThread(d->path());
            }
        }
        ++d;
    }

    std::cout << "Scan Function: Calling TerminateThreads() " << std::endl;
    pool.terminateThreads();
    std::cout << "Scan Function: Called TerminateThreads() " << std::endl;
    return filesFound;
}

Ill repeat the question again: What can I do to halt the main thread until worker threads have finished? 我会再次重复这个问题:在工作线程完成之前,我该怎么做才能停止主线程?

I don't get the issue with thread termination and join. 我没有出现线程终止和连接的问题。

Joining threads is all about waiting until the given thread has terminated, so it's exaclty what you want to do. 加入线程只不过是要等到给定线程终止后才行,所以这很简单。 If the thread has finished execution already, join will just return immediately. 如果线程已经完成执行, join将立即返回。

So you'll just want to join each thread during the terminate call as you already do in your code. 因此,您只想像在代码中已经做过的那样,在terminate调用期间加入每个线程。

Note: currently you immediately rethrow any exception if a thread you just terminated has an active exception_ptr . 注意:如果您刚刚终止的线程具有活动的exception_ptr则当前您会立即抛出任何exception_ptr That might lead to unjoined threads. 这可能导致未连接的线程。 You'll have to keep that in mind when handling those exceptions 处理这些异常时,您必须牢记这一点

Update: after looking at your code, I see a potential bug: std::condition_variable::wait() can return when a spurious wakeup occurs. 更新:查看您的代码后,我看到一个潜在的错误:当发生虚假唤醒时, std::condition_variable::wait()可以返回。 If that is the case, you will work again on the path that was worked on the last time, leading to wrong results. 如果是这种情况,您将在上次使用的路径上再次工作,从而导致错误的结果。 You should have a flag for new work that is set if new work has been added, and that _threadCond.wait(lock) line should be in a loop that checks for the flag and _terminate . 如果已经添加了新工作,则应该为新工作设置一个标志,并且_threadCond.wait(lock)行应处于检查标志和_terminate Not sure if that one will fix your problem, though. 不过,不确定是否可以解决您的问题。

The problem was two fold: 问题有两个:

synchronizeStartingThreads() would sometimes have 1 or 2 threads blocked, waiting for the okay to go ahead (a problem in the while (some_condition) barrierCond.wait(lock). The condition would sometimes never evaluate to true. removing the while loop fixed this blocking issue. syncnizeStartingThreads()有时会阻塞1或2个线程,等待一切顺利(while(some_condition)barrierCond.wait(lock)中的问题。该条件有时永远不会评估为true。阻止问题。

The second issue was the potential for a worker thread to enter the _threadMtx, and notify_all was called just before they entered the _threadCond.wait(), since notify was already called, the thread would wait forever. 第二个问题是工作线程有可能进入_threadMtx,并在进入_threadCond.wait()之前调用notify_all,因为已经调用了notify,所以线程将永远等待。

ie. 即。

{
    // terminate() is called
    std::unique_lock<std::mutex> lock(_threadMtx);
    // _threadCond.notify_all() is called here
    _busy = false;
    _threadCond.wait(lock);
    // thread is blocked forever
}

surprisingly, locking this mutex in terminate() did not stop this from happening. 令人惊讶的是,将此互斥锁锁定在terminate()中并没有阻止这种情况的发生。

This was solved by adding a timeout of 30ms to the _threadCond.wait() 通过在_threadCond.wait()中添加30ms超时来解决此问题

Also, a check was added before the starting of task to make sure the same task wasn't being processed again. 另外,在任务开始之前添加了检查,以确保不会再次处理同一任务。

The new code now looks like this: 现在,新代码如下所示:

thisWorkerThread thisWorkerThread

_threadCond.wait_for(lock, std::chrono::milliseconds(30));  // hold the lock a max of 30ms

// after the lock, and the termination check

if(_busy)
        {
            Global::mFile_t rMap = _thisPool->_task(_pPath, *_pArgs, _thisPool->coutMatch_mtx);
            _workerMap.element.insert(rMap.element.begin(), rMap.element.end());
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM