简体   繁体   English

pthread_join-等待多个线程

[英]pthread_join - multiple threads waiting

Using POSIX threads & C++, I have an "Insert operation" which can only be done safely one at a time. 使用POSIX线程和C ++,我有一个“插入操作”,一次只能安全地完成一次。

If I have multiple threads waiting to insert using pthread_join then spawning a new thread when it finishes. 如果我有多个线程正在等待使用pthread_join进行插入,则在完成时会生成一个新线程。 Will they all receive the "thread complete" signal at once and spawn multiple inserts or is it safe to assume that the thread that receives the "thread complete" signal first will spawn a new thread blocking the others from creating new threads. 他们是否都将立即接收到“线程完成”信号并生成多个插入,还是可以安全地假设首先接收到“线程完成”信号的线程将生成一个新线程,从而阻止其他线程创建新线程。

/* --- GLOBAL --- */
pthread_t insertThread;



/* --- DIFFERENT THREADS --- */
// Wait for Current insert to finish
pthread_join(insertThread, NULL); 

// Done start a new one
pthread_create(&insertThread, NULL, Insert, Data);

Thank you for the replies 感谢您的答复

The program is basically a huge hash table which takes requests from clients through Sockets. 该程序基本上是一个巨大的哈希表,可通过Socket接收来自客户端的请求。

Each new client connection spawns a new thread from which it can then perform multiple operations, specifically lookups or inserts. 每个新的客户端连接都会产生一个新线程,然后可以从该线程中执行多个操作,特别是查找或插入。 lookups can be conducted in parallel. 查找可以并行进行。 But inserts need to be "re-combined" into a single thread. 但是插入需要被“重新组合”成一个线程。 You could say that lookup operations could be done without spawning a new thread for the client, however they can take a while causing the server to lock, dropping new requests. 您可以说可以在不为客户端生成新线程的情况下完成查找操作,但是它们可能需要一段时间才能导致服务器锁定并丢弃新请求。 The design tries to minimize system calls and thread creation as much as possible. 该设计尝试尽可能减少系统调用和线程创建。

But now that i know it's not safe the way i first thought I should be able to cobble something together 但是现在我知道我最初认为我应该能够将一些东西拼凑在一起的方式并不安全

Thanks 谢谢

From opengroup.org on pthread_join : opengroup.org的pthread_join上

The results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined. 未定义多次同时调用pthread_join()来指定同一目标线程的结果。

So, you really should not have several threads joining your previous insertThread. 因此,您实际上不应该有多个线程加入先前的insertThread。

First, as you use C++, I recommend boost.thread . 首先,当您使用C ++时,我建议使用boost.thread They resemble the POSIX model of threads, and also work on Windows. 它们类似于POSIX线程模型,也可以在Windows上运行。 And it helps you with C++, ie by making function-objects usable more easily. 它可以帮助您使用C ++,即通过使函数对象更容易使用。

Second, why do you want to start a new thread for inserting an element, when you always have to wait for the previous one to finish before you start the next one? 其次,为什么总是要等待上一个线程完成才开始下一个线程,所以为什么要启动一个用于插入元素的新线程? Seems not to be classical use of multiple-threads. 似乎不是多线程的经典用法。

Although... One classical solution to this would be to have one worker-thread getting jobs from an event-queue, and other threads posting the operation onto the event-queue. 尽管...一个经典的解决方案是让一个工作线程从事件队列中获取作业,而其他线程将操作发布到事件队列中。

If you really just want to keep it more or less the way you have it now, you'd have to do this: 如果您真的只想保留现在的方式,则必须这样做:

  • Create a condition variable, like insert_finished . 创建一个条件变量,例如insert_finished
  • All the threads which want to do an insert, wait on the condition variable. 所有想要插入的线程都在条件变量上等待。
  • As soon as one thread is done with its insertion, it fires the condition variable. 一旦一个线程的插入完成,它将触发条件变量。
  • As the condition variable requires a mutex, you can just notify all waiting threads, they all want start inserting, but as only one thread can acquire the mutex at a time, all threads will do the insert sequentially. 由于条件变量需要一个互斥锁,因此您可以只通知所有正在等待的线程,它们都希望开始插入,但是由于一次只有一个线程可以获取该互斥锁,因此所有线程将按顺序进行插入。

But you should take care that your synchronization is not implemented in a too ad-hoc way. 但是,请注意不要以太临时的方式来实现同步。 As this is called insert , I suspect you want to manipulate a data-structure, so you probably want to implement a thread-safe data-structure first, instead of sharing the synchronization between data-structure-accesses and all clients. 由于这称为insert ,我怀疑您想操纵一个数据结构,因此您可能想首先实现一个线程安全的数据结构,而不是在数据结构访问和所有客户端之间共享同步。 I also suspect that there will be more operations then just insert , which will need proper synchronization... 我也怀疑会有更多的操作,然后只是insert ,这将需要适当的同步...

According to the Single Unix Specifcation: "The results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined." 根据“单一Unix规范”:“对同时指定相同目标线程的pthread_join()的多次调用的结果未定义。”

The "normal way" of achieving a single thread to get the task would be to set up a condition variable (don't forget the related mutex): idle threads wait in pthread_cond_wait() (or pthread_cond_timedwait()), and when the thread doing the work has finished, it wakes up one of the idle ones with pthread_cond_signal(). 实现单个线程以完成任务的“正常方法”是设置条件变量(不要忘记相关的互斥锁):空闲线程在pthread_cond_wait()(或pthread_cond_timedwait())中等待,以及何时线程工作完成后,它会使用pthread_cond_signal()唤醒其中一个空闲的设备。

Yes as most people recommended the best way seems to have a worker thread reading from a queue. 是的,因为大多数人建议最好的方法似乎是让工作线程从队列中读取。 Some code snippets below 下面的一些代码片段

    pthread_t       insertThread = NULL;
    pthread_mutex_t insertConditionNewMutex = PTHREAD_MUTEX_INITIALIZER;
    pthread_mutex_t insertConditionDoneMutex    = PTHREAD_MUTEX_INITIALIZER;
    pthread_cond_t  insertConditionNew      = PTHREAD_COND_INITIALIZER;
    pthread_cond_t  insertConditionDone     = PTHREAD_COND_INITIALIZER;

       //Thread for new incoming connection
        void * newBatchInsert()
        {
           for(each Word)
           {
                            //Push It into the queue
                            pthread_mutex_lock(&lexicon[newPendingWord->length - 1]->insertQueueMutex);
                                lexicon[newPendingWord->length - 1]->insertQueue.push(newPendingWord);
                            pthread_mutex_unlock(&lexicon[newPendingWord->length - 1]->insertQueueMutex);

           }

                    //Send signal to worker Thread
                    pthread_mutex_lock(&insertConditionNewMutex);
                        pthread_cond_signal(&insertConditionNew);
                    pthread_mutex_unlock(&insertConditionNewMutex);

                    //Wait Until it's finished
                    pthread_cond_wait(&insertConditionDone, &insertConditionDoneMutex);

        }


            //Worker thread
            void * insertWorker(void *)
            {

                while(1)        
                {

                    pthread_cond_wait(&insertConditionNew, &insertConditionNewMutex);

                    for (int ii = 0; ii < maxWordLength; ++ii)
                    {                   
                            while (!lexicon[ii]->insertQueue.empty())
                            {

                                queueNode * newPendingWord = lexicon[ii]->insertQueue.front();


                                lexicon[ii]->insert(newPendingWord->word);

                                pthread_mutex_lock(&lexicon[ii]->insertQueueMutex);
                                lexicon[ii]->insertQueue.pop();
                                pthread_mutex_unlock(&lexicon[ii]->insertQueueMutex);

                            }

                    }

                    //Send signal that it's done
                    pthread_mutex_lock(&insertConditionDoneMutex);
                        pthread_cond_broadcast(&insertConditionDone);
                    pthread_mutex_unlock(&insertConditionDoneMutex);

                }

            }

            int main(int argc, char * const argv[]) 
            {

                pthread_create(&insertThread, NULL, &insertWorker, NULL);


                lexiconServer = new server(serverPort, (void *) newBatchInsert);

                return 0;
            }

The others have already pointed out this has undefined behaviour. 其他人已经指出这具有不确定的行为。 I'd just add that the really simplest way to accomplish your task (to allow only one thread executing part of code) is to use a simple mutex - you need the threads executing that code to be MUTally EXclusive, and that's where mutex came to its name :-) 我只是补充说,完成任务的最简单方法(只允许一个线程执行代码的一部分)是使用一个简单的互斥锁-您需要执行该代码的线程必须互斥,这就是互斥锁的所在其名称 :-)

If you need the code to be ran in a specific thread (like Java AWT), then you need conditional variables. 如果需要在特定线程(例如Java AWT)中运行代码,则需要条件变量。 However, you should think twice whether this solution actually pays off. 但是,您应该三思而后行,该解决方案是否真正奏效。 Imagine, how many context switches you need if you call your "Insert operation" 10000 times per second. 想象一下,如果每秒调用10000次“插入操作”,则需要多少上下文切换。

As you just now mentioned you're using a hash-table with several look-ups parallel to insertions, I'd recommend to check whether you can use a concurrent hash-table. 正如您刚才提到的那样,您正在使用一个哈希表,该哈希表具有几个与插入并行的查找,我建议您检查是否可以使用并发哈希表。

As the exact look-up results are non-deterministic when you're inserting elements simultaneously, such a concurrent hash-map may be exactly what you need. 当您同时插入元素时,由于确切的查找结果不确定,因此并发哈希映射可能正是您所需要的。 I do not have used concurrent hash-tables in C++, though, but as they are available in Java, you'll for sure find a library doing this in C++. 我没有在C ++中使用并发哈希表,但是由于Java中有并发哈希表,因此您肯定会在C ++中找到一个这样做的库。

The only library which i found which supports inserts without locking new lookups - Sunrise DD (And i'm not sure whether it supports concurrent inserts) 我发现的唯一支持插入而不锁定新查找的库-Sunrise DD (而且我不确定它是否支持并发插入)

However the switch from Google's Sparse Hash map more than doubles the memory usage. 但是,从Google的“ 稀疏散列”地图切换后,内存使用量增加了一倍以上。 Lookups should happen fairly infrequently so rather than trying and write my own library which combines the advantages of both i would rather just lock the table suspending lookups while changes are made safely. 查找应该很少发生,因此与其尝试编写自己的库(结合了两者的优点),不如尝试自己编写一个可以安全地进行更改的表锁定,以挂起查找。

Thanks again 再次感谢

It seems to me that you want to serialise inserts to the hashtable. 在我看来,您想将插入序列化到哈希表。

For this you want a lock - not spawning new threads. 为此,您需要一个锁-不产生新线程。

From your description that looks very inefficient as you are re-creating the insert thread every time you want to insert something. 从您的描述来看,效率很低,因为您每次要插入某事物时都在重新创建插入线程。 The cost of creating the thread is not 0. 创建线程的成本不为0。

A more common solution to this problem is to spawn an insert thread that waits on a queue (ie sits in a loop sleeping while the loop is empty). 解决此问题的更常见解决方案是产生一个等待队列的插入线程(即,在循环为空时坐在循环中休眠)。 Other threads then add work items to the queue. 然后其他线程将工作项添加到队列中。 The insert thread picks items of the queue in the order they were added (or by priority if you want) and does the appropriate action. 插入线程按添加顺序(或优先级,如果需要)选择队列中的项目并执行适当的操作。

All you have to do is make sure addition to the queue is protected so that only one thread at a time has accesses to modifying the actual queue, and that the insert thread does not do a busy wait but rather sleeps when nothing is in the queue (see condition variable). 您要做的就是确保对队列的添加进行了保护,以便一次只能有一个线程可以修改实际队列,并且插入线程不会进行繁忙的等待,而是在队列中没有任何内容时进入睡眠状态。 (请参阅条件变量)。

Ideally,you dont want multiple threadpools in a single process, even if they perform different operations. 理想情况下,即使它们执行不同的操作,您也不希望多个线程池在一个进程中。 The resuability of a thread is an important architectural definition, which leads to pthread_join being created in a main thread if you use C. 线程的可重用性是一个重要的体系结构定义,如果使用C,则会导致在主线程中创建pthread_join。

Ofcourse, for a C++ threadpool aka ThreadFactory , the idea is to keep the thread primitives abstract so, it can handle any of function/operation types passed to it. 当然,对于C ++线程池(又称ThreadFactory),其思想是保持线程原语抽象,以便它可以处理传递给它的任何函数/操作类型。

A typical example would be a webserver which will have connection pools and thread pools which service connections and then process them further, but, all are derived from a common threadpool process. 一个典型的例子是一个Web服务器,它将具有连接池和线程池,这些池为连接提供服务,然后进一步处理它们,但是,所有这些都源自一个公共线程池进程。

SUMMARY : AVOID PTHREAD_JOIN IN any place other than a main thread. 概要:在主线程以外的任何地方都避免使用PTHREAD_JOIN。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM