[英]Race condition between terminating worker threads and main thread
I am having an issue with terminating worker threads from the main thread. 我在从主线程终止工作线程时遇到问题。 So far each method I tried either leads to a race condition or dead lock. 到目前为止,我尝试的每种方法都会导致争用状况或死锁。
The worker threads are stored in a inner class inside a class called ThreadPool, ThreadPool maintains a vector of these WorkerThreads using unique_ptr. 工作线程存储在名为ThreadPool的类的内部类中,ThreadPool使用unique_ptr维护这些WorkerThreads的向量。
Here is the header for my ThreadPool: 这是我的ThreadPool的标题:
class ThreadPool
{
public:
typedef void (*pFunc)(const wpath&, const Args&, Global::mFile_t&, std::mutex&, std::mutex&); // function to point to
private:
class WorkerThread
{
private:
ThreadPool* const _thisPool; // reference enclosing class
// pointers to arguments
wpath _pPath; // member argument that will be modifyable to running thread
Args * _pArgs;
Global::mFile_t * _pMap;
// flags for thread management
bool _terminate; // terminate thread
bool _busy; // is thread busy?
bool _isRunning;
// thread management members
std::mutex _threadMtx;
std::condition_variable _threadCond;
std::thread _thisThread;
// exception ptr
std::exception_ptr _ex;
// private copy constructor
WorkerThread(const WorkerThread&): _thisPool(nullptr) {}
public:
WorkerThread(ThreadPool&, Args&, Global::mFile_t&);
~WorkerThread();
void setPath(const wpath); // sets a new task
void terminate(); // calls terminate on thread
bool busy() const; // returns whether thread is busy doing task
bool isRunning() const; // returns whether thread is still running
void join(); // thread join wrapper
std::exception_ptr exception() const;
// actual worker thread running tasks
void thisWorkerThread();
};
// thread specific information
DWORD _numProcs; // number of processors on system
unsigned _numThreads; // number of viable threads
std::vector<std::unique_ptr<WorkerThread>> _vThreads; // stores thread pointers - workaround for no move constructor in WorkerThread
pFunc _task; // the task threads will call
// synchronization members
unsigned _barrierLimit; // limit before barrier goes down
std::mutex _barrierMtx; // mutex for barrier
std::condition_variable _barrierCond; // condition for barrier
std::mutex _coutMtx;
public:
// argument mutex
std::mutex matchesMap_mtx;
std::mutex coutMatch_mtx;
ThreadPool(pFunc f);
// wake a thread and pass it a new parameter to work on
void callThread(const wpath&);
// barrier synchronization
void synchronizeStartingThreads();
// starts and synchronizes all threads in a sleep state
void startThreads(Args&, Global::mFile_t&);
// terminate threads
void terminateThreads();
private:
};
So far the real issue I am having is that when calling terminateThreads() from main thread causes dead lock or race condition. 到目前为止,我真正遇到的问题是从主线程调用TerminateThreads()时会导致死锁或竞争状态。
When I set my _terminate flag to true, there is a chance that the main will already exit scope and destruct all mutexes before the thread has had a chance to wake up and terminate. 当我将_terminate标志设置为true时,在线程有机会唤醒并终止之前,main可能已经退出作用域并破坏了所有互斥对象。 In fact I have gotten this crash quite a few times (console window displays: mutex destroyed while busy) 实际上,我已经多次崩溃(控制台窗口显示:互斥锁在忙碌时被破坏)
If I add a thread.join() after I notify_all() the thread, there is a chance the thread will terminate before the join occurs, causing an infinite dead lock, as joining to a terminated thread suspends the program indefinitely. 如果我在notify_all()线程之后添加thread.join(),则线程有可能在连接发生之前终止,从而导致无限死锁,因为与终止线程的连接将无限期挂起程序。
If I detach - same issue as above, but causes program crash 如果我分离-与上述相同,但导致程序崩溃
If I instead use a while(WorkerThread.isRunning()) Sleep(0); 如果我改为使用while(WorkerThread.isRunning())Sleep(0); The program may crash because the main thread may exit before the WorkerThread reaches that last closing brace. 程序可能会崩溃,因为主线程可能在WorkerThread到达最后一个关闭括号之前退出。
I am not sure what else to do to stop halt the main until all worker threads have terminated safely. 在所有工作线程安全终止之前,我不确定还可以采取什么措施来停止主线程。 Also, even with try-catch in thread and main, no exceptions are being caught. 同样,即使在线程和main中使用try-catch,也不会捕获任何异常。 (everything I have tried leads to program crash) (我尝试过的所有方法都会导致程序崩溃)
What can I do to halt the main thread until worker threads have finished? 在工作线程完成之前,我该怎么做才能停止主线程?
Here are the implementations of the primary functions: 以下是主要功能的实现:
Terminate Individual worker thread 终止单个工作线程
void ThreadPool::WorkerThread::terminate()
{
_terminate = true;
_threadCond.notify_all();
_thisThread.join();
}
The actual ThreadLoop 实际的ThreadLoop
void ThreadPool::WorkerThread::thisWorkerThread()
{
_thisPool->synchronizeStartingThreads();
try
{
while (!_terminate)
{
{
_thisPool->_coutMtx.lock();
std::cout << std::this_thread::get_id() << " Sleeping..." << std::endl;
_thisPool->_coutMtx.unlock();
_busy = false;
std::unique_lock<std::mutex> lock(_threadMtx);
_threadCond.wait(lock);
}
_thisPool->_coutMtx.lock();
std::cout << std::this_thread::get_id() << " Awake..." << std::endl;
_thisPool->_coutMtx.unlock();
if(_terminate)
break;
_thisPool->_task(_pPath, *_pArgs, *_pMap, _thisPool->coutMatch_mtx, _thisPool->matchesMap_mtx);
_thisPool->_coutMtx.lock();
std::cout << std::this_thread::get_id() << " Finished Task..." << std::endl;
_thisPool->_coutMtx.unlock();
}
_thisPool->_coutMtx.lock();
std::cout << std::this_thread::get_id() << " Terminating" << std::endl;
_thisPool->_coutMtx.unlock();
}
catch (const std::exception&)
{
_ex = std::current_exception();
}
_isRunning = false;
}
Terminate All Worker Threads 终止所有工作线程
void ThreadPool::terminateThreads()
{
for (std::vector<std::unique_ptr<WorkerThread>>::iterator it = _vThreads.begin(); it != _vThreads.end(); ++it)
{
it->get()->terminate();
//it->get()->_thisThread.detach();
// if thread threw an exception, rethrow it in main
if (it->get()->exception() != nullptr)
std::rethrow_exception(it->get()->exception());
}
}
and lastly, the function that is calling the thread pool (the scan function is running on main) 最后,正在调用线程池的函数(扫描函数在main上运行)
// scans a path recursively for all files of selected extension type, calls thread to parse file
unsigned int Functions::Scan(wpath path, const Args& args, ThreadPool& pool)
{
wrecursive_directory_iterator d(path), e;
unsigned int filesFound = 0;
while ( d != e )
{
if (args.verbose())
std::wcout << L"Grepping: " << d->path().string() << std::endl;
for (Args::ext_T::const_iterator it = args.extension().cbegin(); it != args.extension().cend(); ++it)
{
if (extension(d->path()) == *it)
{
++filesFound;
pool.callThread(d->path());
}
}
++d;
}
std::cout << "Scan Function: Calling TerminateThreads() " << std::endl;
pool.terminateThreads();
std::cout << "Scan Function: Called TerminateThreads() " << std::endl;
return filesFound;
}
Ill repeat the question again: What can I do to halt the main thread until worker threads have finished? 我会再次重复这个问题:在工作线程完成之前,我该怎么做才能停止主线程?
I don't get the issue with thread termination and join. 我没有出现线程终止和连接的问题。
Joining threads is all about waiting until the given thread has terminated, so it's exaclty what you want to do. 加入线程只不过是要等到给定线程终止后才行,所以这很简单。 If the thread has finished execution already, join
will just return immediately. 如果线程已经完成执行, join
将立即返回。
So you'll just want to join each thread during the terminate
call as you already do in your code. 因此,您只想像在代码中已经做过的那样,在terminate
调用期间加入每个线程。
Note: currently you immediately rethrow any exception if a thread you just terminated has an active exception_ptr
. 注意:如果您刚刚终止的线程具有活动的exception_ptr
则当前您会立即抛出任何exception_ptr
。 That might lead to unjoined threads. 这可能导致未连接的线程。 You'll have to keep that in mind when handling those exceptions 处理这些异常时,您必须牢记这一点
Update: after looking at your code, I see a potential bug: std::condition_variable::wait()
can return when a spurious wakeup occurs. 更新:查看您的代码后,我看到一个潜在的错误:当发生虚假唤醒时, std::condition_variable::wait()
可以返回。 If that is the case, you will work again on the path that was worked on the last time, leading to wrong results. 如果是这种情况,您将在上次使用的路径上再次工作,从而导致错误的结果。 You should have a flag for new work that is set if new work has been added, and that _threadCond.wait(lock)
line should be in a loop that checks for the flag and _terminate
. 如果已经添加了新工作,则应该为新工作设置一个标志,并且_threadCond.wait(lock)
行应处于检查标志和_terminate
。 Not sure if that one will fix your problem, though. 不过,不确定是否可以解决您的问题。
The problem was two fold: 问题有两个:
synchronizeStartingThreads() would sometimes have 1 or 2 threads blocked, waiting for the okay to go ahead (a problem in the while (some_condition) barrierCond.wait(lock). The condition would sometimes never evaluate to true. removing the while loop fixed this blocking issue. syncnizeStartingThreads()有时会阻塞1或2个线程,等待一切顺利(while(some_condition)barrierCond.wait(lock)中的问题。该条件有时永远不会评估为true。阻止问题。
The second issue was the potential for a worker thread to enter the _threadMtx, and notify_all was called just before they entered the _threadCond.wait(), since notify was already called, the thread would wait forever. 第二个问题是工作线程有可能进入_threadMtx,并在进入_threadCond.wait()之前调用notify_all,因为已经调用了notify,所以线程将永远等待。
ie. 即。
{
// terminate() is called
std::unique_lock<std::mutex> lock(_threadMtx);
// _threadCond.notify_all() is called here
_busy = false;
_threadCond.wait(lock);
// thread is blocked forever
}
surprisingly, locking this mutex in terminate() did not stop this from happening. 令人惊讶的是,将此互斥锁锁定在terminate()中并没有阻止这种情况的发生。
This was solved by adding a timeout of 30ms to the _threadCond.wait() 通过在_threadCond.wait()中添加30ms超时来解决此问题
Also, a check was added before the starting of task to make sure the same task wasn't being processed again. 另外,在任务开始之前添加了检查,以确保不会再次处理同一任务。
The new code now looks like this: 现在,新代码如下所示:
thisWorkerThread thisWorkerThread
_threadCond.wait_for(lock, std::chrono::milliseconds(30)); // hold the lock a max of 30ms
// after the lock, and the termination check
if(_busy)
{
Global::mFile_t rMap = _thisPool->_task(_pPath, *_pArgs, _thisPool->coutMatch_mtx);
_workerMap.element.insert(rMap.element.begin(), rMap.element.end());
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.