简体   繁体   中英

C++ std::thread stopping condition for thread pool

I am writing a program that utilizes a thread pool in order to search through files of a specified extension for matches to a regex expression.

My thread pool looks like this:

for( int i = 0; i < _nThreads; ++i )
    {
            _threads.push_back( thread( &ThreadPool::GrepFunc, this ) );
    }

and the running function looks like this:

void ThreadPool::GrepFunc()
{
    // implement a barrier

while( !_done )
{
    while( !_tasks.empty() )
    {
        fs::path task;
        bool gotTask = false;
        {
            lock_guard<mutex> tl( _taskMutex );
            if( !_tasks.empty() )
            {
                task = _tasks.front();
                _tasks.pop();
                gotTask = true;
            }
        }

        if( gotTask )
        {
            if( std::tr2::sys::is_directory( task ) )
            {
                for( fs::directory_iterator dirIter( task ), endIter; dirIter != endIter; ++dirIter )
                {
                    if( fs::is_directory( dirIter->path() ) )
                    {
                        { lock_guard<mutex> tl( _taskMutex );
                        _tasks.push( dirIter->path() ); }
                    }
                    else
                    {
                        for( auto& e : _args.extensions() )
                        {
                            if( !dirIter->path().extension().compare( e ) )
                            {
                                SearchFile( dirIter->path() );
                            }
                        }
                    }
                }
            }
            else
            {
                for( auto& e : _args.extensions() )
                {
                    if( !task.extension().compare( e ) )
                    {
                        SearchFile( task );
                    }
                }
            }
        }
    }
}
}

Essentially the program receives an initial directory from the user and will recursively search through it and all sub directories for files matching the extension looking for regex matches. I am having trouble figuring out how to determine the stopping case for when _done has been reached. I need to ensure that all directories and files inside the initial directory have been scanned and also that all items inside of _tasks have been completed before I join the threads back. Any thoughts would really be appreciated.

I'd suggest having one thread (possibly the same thread spawning the file-processing threads) dedicated to doing the recursive filesystem search for matching files; it can add the files into a work queue from which the file-searching threads can pick up work. You can use a condition variable to coordinate this.

Coordinating shutdown is a little tricky, as you've found. After the filesystem-search thread has completed its search, it can set some "just finish what's queued" flag visible to the worker threads then signal them all to wake up and try to process another file: if they find the file/work queue empty they exit. The filesystem-search thread then joins all workers.

Regarding your updated question in the comment of Tony's answer, I would suggest to have 2 kind of tasks: one for exploring the subdirectories recursively and one for grep. You need a SynQueue<TaskBase> , TaskSubDir: TaskBase , and TaskGrep: TaskBase . TaskBase has a virtual interface functon Run() . Then the threads can pop repeatedly from SynQueue , and call TaskBase::Run() :

  1. if it got a TaskSubDir , then it would find sub-directories and files in a given path: (a) if it is a folder, add a new TaskSubDir of the sub-dir to SynQueue , such that the folders are searched recursively with the threadpool; (b) if it is a file with matched extension, then it pushs a TaskGrep to SynQueue .
  2. if it got a TaskGrep , then it performs the SearchFile .
  3. if the queue is empty, break out of the worker function.

Doing so, you don't need to have 2 queues and wait for the sub-directory queue to finish before starting the grep queue.

So answering your question: to determine the joining condition, all you need to do is to wait for all the threads to break out of the worker function.

Final note: the first _tasks.empty() in your code is not protected by mutex and may suffers from racing condition. I suggest you to hide the mutex and cond_var in a SynQueue class, and add a SynQueue::empty() member function (protected by mutex). If efficiency is your concern, you may want to consider Lock-free queue to replace SynQueue .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM