简体   繁体   English

多处理队列关闭信号

[英]Multiprocessing queue closing signal

Suppose I have a number of items that I put in a queue for other processes to deal with.假设我有一些项目放入队列中以供其他进程处理。 The items are rather large in memory, therefore I limit the queue size. memory 中的项目相当大,因此我限制了队列大小。 At some point I will have no more things to put in the queue.在某些时候,我将没有更多的东西可以放入队列中。 How can I signal the other processes that the queue is closed?我如何向其他进程发出队列已关闭的信号?

One option would be to close the child processes when the queue is empty, but this relies on the queue being emptied slower than it is being filled.一种选择是在队列为空时关闭子进程,但这依赖于队列的清空速度比填充速度慢。

The documentation of multiprocessing.Queue talks about the following method: multiprocessing.Queue的文档讨论了以下方法:

close()关闭()

Indicate that no more data will be put on this queue by the current process.指示当前进程不会再将数据放入此队列。 The background thread will quit once it has flushed all buffered data to the pipe. This is called automatically when the queue is garbage collected.一旦将所有缓冲数据刷新到 pipe,后台线程将退出。当队列被垃圾回收时,会自动调用它。

Is it safe to call close while there are still items in the queue?在队列中仍有项目时调用 close 是否安全? Are these items guaranteed to be processed?是否保证会处理这些项目? How can a another processes know that the queue is closed?另一个进程如何知道队列已关闭?

Is this a theoretical question or do you have some code your trying to get to work?这是一个理论问题,还是您有一些代码试图开始工作?

Answering the first question, yes you can use the close() method on a multiprocessing.Queue while there still are items in the queue, but note the method will only indicate to other processes that no more data will be put on the queue by the current process.回答第一个问题,是的,当队列中仍有项目时,您可以在multiprocessing.Queue上使用close()方法,但请注意该方法只会向其他进程指示不再有数据将被放入队列当前进程。 The items that are already in the queue should still be processed by the other processes.已经在队列中的项目仍应由其他进程处理。

And you could place a sentinel value in the que that the other processes can then check.您可以在其他进程可以检查的队列中放置一个哨兵值。

Example, for the check例如,支票

def worker(queue, event):
    #Continously check event flag while it is not set
    while not event.is_set():
        try:
            #Get an item from que with 1 second timeout
            item = queue.get(timeout=1)
            if item is None:
                event.set()
                print("Worker: Queue is closed")
                break
            print("Worker: Processing item {}".format(item))
        #Process the item
        except Empty:
            #If the queue is empty and timeout is reached then pass
            pass

def handleQue():
    #Creating a queue and a event flag
    queue = Queue()
    event = Event()
    #Start three worker processes
    processes = [Process(target=worker, args=(queue, event)) for i in range(3)]
    for process in processes:
        process.start()
    #Put items in the queue
    for i in range(10):
        queue.put(i)
        print("Main Process: Putting item {} in the queue".format(i))
    #Signal to other processes that the queue is closed
    for i in range(3):
        queue.put(None)
        print("Main Process: Putting sentinel value in the queue")
    for process in processes:
        process.join()
    event.clear()

This is a common scenario: how do I tell all queue consumers that no more items will be enqueued?这是一个常见的场景:我如何告诉所有队列消费者不再有项目入队? Multiprocessing apps using POSIX message queues, datagram sockets, or even just named pipes, for example, might all face this.例如,使用 POSIX 消息队列、数据报 sockets 或什至只是命名管道的多处理应用程序都可能面临此问题。

The easiest thing to do here would be to enqueue a single, special "all done" message, which each consumer receives and puts() back on the queue for the next consumer to do the same.此处最简单的做法是将一条特殊的“全部完成”消息加入队列,每个消费者接收该消息并将puts()放回队列中,以便下一个消费者执行相同的操作。

( close() is indeed safe but inapplicable here. Any "in flight" items will be safely enqueued, but the close() doesn't tell the consumers that no more producers remain.) close()确实是安全的,但在这里不适用。任何“飞行中”的项目都将安全地排队,但close()不会告诉消费者没有更多的生产者了。)

a multiprocessing queue is simply a pipe with a lock to avoid concurrent reads/writes from different processes.多处理队列只是一个带有锁的 pipe,以避免来自不同进程的并发读/写。

a pipe typically has 2 sides, a read and a write, when a process tries to read from a pipe, the OS will first serve things that are in the pipe, but if the pipe is empty, the OS will suspend this process, and check if any process can write to the write end, if the answer is yes, then the OS just keeps this process suspended till someone else writes to the pipe, and if there is no one else that can write to the pipe, then the OS will send an end-of-file to the reader, which wakes him up and tells him "don't wait on a message, none can send a message on this pipe". pipe 通常有两个方面,一个读和一个写,当一个进程试图从 pipe 中读取时,操作系统将首先服务于 pipe 中的内容,但如果 pipe 为空,操作系统将暂停此进程,并且检查是否有任何进程可以写入写端,如果答案是肯定的,那么操作系统只是暂停这个进程,直到其他人写入 pipe,如果没有其他人可以写入 pipe,那么操作系统将向读取器发送一个文件结尾,这会唤醒他并告诉他“不要等待消息,没有人可以在此管道上发送消息”。

in the case of a queue, it is different, as the reading process has both a read and a write ends of this pipe, the number of processes that can write to the queue is never zero, so reading from a queue that no other process can write to will result in the program being paused indefinitely , the reader has no direct way of knowing that the queue was closed by the other processes when they do.在队列的情况下,它是不同的,因为读取进程有这个 pipe 的读取和写入端,可以写入队列的进程数永远不会为零,所以从队列中读取没有其他进程can write to 将导致程序无限期暂停,读者无法直接知道队列在其他进程关闭时是否已关闭。

the way multiprocessing library itself handles it in its pools is to send a message on the queue that will terminate the workers, for example the reader can terminate once it sees None on the pipe or some predefined object or string like "END" or "CLOSE" , since this will be the last item on the queue, there should be no items after it, and once the reader reads it he will terminate, and if you have multiple readers then you should send multiple end messages on the queue.多处理库本身在其池中处理它的方式是在队列上发送一条消息,该消息将终止工作人员,例如,一旦读取器在 pipe 或一些预定义的 object 或字符串如"END""CLOSE" ”上看到None就可以终止"CLOSE" ,因为这将是队列中的最后一个项目,所以它后面不应该有任何项目,一旦读者阅读它,他就会终止,如果你有多个读者,那么你应该在队列中发送多个结束消息。

but what if the child process crashes or for some reason doesn't send it?但是如果子进程崩溃或由于某种原因不发送怎么办? your main process will be stuck on the get and will be suspended indefinitely.... so if you are manually using a queue you should take all precautions to make sure this doesn't happen (like setting a timeout, and monitoring the other writers in another thread, etc.)您的主要进程将卡在get上并将无限期暂停....因此,如果您手动使用队列,则应采取所有预防措施以确保不会发生这种情况(例如设置超时,并监视其他作者在另一个线程等)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM