简体   繁体   English

在python中使用多重处理时,可以安全使用全局队列吗?

[英]Can I safely use global Queues when using multiprocessing in python?

I have a large codebase to parallelise. 我有一个大型代码库可以并行化。 I can avoid rewriting the method signatures of hundreds of functions by using a single global queue. 我可以避免使用单个全局队列来重写数百个函数的方法签名。 I know it's messy; 我知道这很乱。 please don't tell me that if I'm using globals I'm doing something wrong in this case it really is the easiest choice. 请不要告诉我,如果我使用的是Globals,那么在这种情况下,这确实是最简单的选择。 The code below works but i don't understand why. 下面的代码有效,但我不明白为什么。 I declare a global multiprocessing.Queue() but don't declare that it should be shared between processes (by passing it as a parameter to the worker). 我声明了一个全局multiprocessing.Queue(),但是没有声明它应该在进程之间共享(通过将其作为参数传递给worker)。 Does python automatically place this queue in shared memory? python是否会自动将此队列放入共享内存中? Is it safe to do this on a larger scale? 大规模执行此操作是否安全?

Note: You can tell that the queue is shared between the processes: the worker processes start doing work on empty queues and are idle for one second before the main queue pushes some work onto the queues. 注意:您可以说队列是在进程之间共享的:工作进程开始在空队列上执行工作,并在主队列将某些工作推入队列之前空闲一秒钟。

import multiprocessing
import time

outqueue = None


class WorkerProcess(multiprocessing.Process):
    def __init__(self):
        multiprocessing.Process.__init__(self)
        self.exit = multiprocessing.Event()

    def doWork(self):
        global outqueue
        ob = outqueue.get()
        ob = ob + "!"
        print ob
        time.sleep(1) #simulate more hard work
        outqueue.put(ob)

    def run(self):
        while not self.exit.is_set():
            self.doWork()

    def shutdown(self):
        self.exit.set()

if __name__ == '__main__':
    global outqueue
    outqueue = multiprocessing.Queue()

    procs = []
    for x in range(10):
        procs.append(WorkerProcess())
        procs[x].start()

    time.sleep(1)
    for x in range(20):
        outqueue.put(str(x))

    time.sleep(10)
    for p in procs:
        p.shutdown()

    for p in procs:
        p.join()

    try:
        while True:
            x = outqueue.get(False)
            print x
    except:
        print "done"

Assuming you're using Linux, the answer is in the way the OS creates a new process. 假设您使用的是Linux,答案就是操作系统创建新进程的方式。

When a process spawns a new one in Linux, it actually forks the parent one. 当进程在Linux中产生一个新进程时,它实际上会派生一个父进程。 The result is a child process with all the properties of the parent one. 结果是一个子进程,具有父进程的所有属性。 Basically a clone. 基本上是一个克隆。

In your example you are instantiating the Queue and then creating the new processes. 在您的示例中,您将实例化队列,然后创建新流程。 Therefore the children processes will have a copy of the same queue and will be able to use it. 因此,子进程将具有相同队列的副本,并将能够使用它。

To see things broken just try to first create the processes and then creating the Queue object. 要查看损坏的内容,请尝试首先创建进程,然后创建Queue对象。 You'll see the children having the global variable still set as None while the parent will have a Queue. 您会看到将全局变量设置为“无”的子级,而父级将有一个“队列”。

It is safe, yet not recommended, to share a Queue as a global variable on Linux. 在Linux上共享队列作为全局变量是安全的,但不建议这样做。 On Windows, due to the different process creation approach, sharing a queue through a global variable won't work. 在Windows上,由于进程创建方法不同,因此无法通过全局变量共享队列。

As mentioned in the programming guidelines 编程指南中所述

Explicitly pass resources to child processes 明确地将资源传递给子进程

On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource. 在使用fork start方法的Unix上,子进程可以使用在使用全局资源的父进程中创建的共享资源。 However, it is better to pass the object as an argument to the constructor for the child process. 但是,最好将对象作为参数传递给子进程的构造函数。

Apart from making the code (potentially) compatible with Windows and the other start methods this also ensures that as long as the child process is still alive the object will not be garbage collected in the parent process. 除了使代码(可能)与Windows和其他启动方法兼容之外,这还确保只要子进程仍然存在,就不会在父进程中垃圾收集对象。 This might be important if some resource is freed when the object is garbage collected in the parent process. 如果在父进程中垃圾回收对象时释放了一些资源,这可能很重要。

For more info about Linux forking you can read its man page . 有关Linux分叉的更多信息,请阅读其手册页

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM