简体   繁体   English

使用multiprocessing.Pool时如何理解multiprocessing.Queue?

[英]How to understand multiprocessing.Queue when working with multiprocessing.Pool?

Why can't I put process in Pool into a Queue ? 为什么不能将Pool process放入Queue
Here my code works when using Pool and can get Test instance attributes. 在这里,我的代码在使用Pool并且可以获取Test实例属性。

from multiprocessing import Pool
from multiprocessing import Queue


class Test(object):
    def __init__(self, num):
        self.num = num


if __name__ == '__main__':
    p = Pool()
    procs = []
    for i in range(5):
        proc = p.apply_async(Test, args=(i,))
        procs.append(proc)
    p.close()
    for each in procs:
        test = each.get(10)
        print(test.num)
    p.join()

When I try to use Queue not python list to store processes, this won't work. 当我尝试使用Queue not python list来存储进程时,这将无法工作。

My code: 我的代码:

from multiprocessing import Pool
from multiprocessing import Queue


class Test(object):
    def __init__(self, num):
        self.num = num


if __name__ == '__main__':
    p = Pool()
    q = Queue()
    for i in range(5):
        proc = p.apply_async(Test, args=(i,))
        q.put(proc)
    p.close()
    while not q.empty():
        q.get()
    p.join()

Error msg: 错误消息:

Traceback (most recent call last):
  File "C:\Users\laich\AppData\Local\Programs\Python\Python36- 
32\lib\multiprocessing\queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "C:\Users\laich\AppData\Local\Programs\Python\Python36- 
32\lib\multiprocessing\reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: can't pickle _thread.lock objects

I go see the multiprocessing doc: 我去看多处理文档:

class multiprocessing.Queue([maxsize]) Returns a process shared queue implemented using a pipe and a few locks/semaphores. class multiprocessing.Queue([maxsize])返回使用管道和一些锁/信号量实现的进程共享队列。 When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe. 当进程首先将项目放入队列时,将启动一个供料器线程,该线程将对象从缓冲区转移到管道中。

The usual queue.Empty and queue.Full exceptions from the standard library's queue module are raised to signal timeouts. 标准库的队列模块中的通常queue.Emptyqueue.Full异常引发了超时。

Queue implements all the methods of queue.Queue except for task_done() and join() . 除了task_done()join()之外,Queue实现了queue.Queue所有方法。

Here it says "puts an item", this item can't be anything (python object)? 这里说“放一个项目”,这个项目不能是任何东西(python对象)? In my case I try to put process in Pool() into Queue . 就我而言,我尝试将Pool() process放入Queue

There are at least two problems with your Queue -based code. 基于Queue的代码至少存在两个问题。 Pool.apply_async method returns an AsyncResult object, not a process. Pool.apply_async方法返回一个AsyncResult对象,而不是一个进程。 You can call get on this object to obtain the result of the corresponding process. 您可以在此对象上调用get获得相应过程的结果。 With this difference in mind let's look at your code: 考虑到这种差异,让我们看一下您的代码:

proc = p.apply_async(Test, args=(i,)) # Returns an AsyncResult object
q.put(proc) # won't work

The second line will always fail in your case. 在您的情况下,第二行将始终失败。 Anything that you put in a queue must be picklable, because multiprocess.Queue uses serialization. 您放入队列中的任何内容都必须是可挑剔的,因为multiprocess.Queue使用序列化。 This is not well documented and there is an open issue in Python's issue tracker to update the documentation. 这没有很好的文档记录,Python的问题跟踪器中有一个未解决的问题,用于更新文档。 The problem is that AsyncResult is not picklable. 问题是AsyncResult不可腌制。 You can try yourself: 您可以尝试一下:

import pickle
import multiprocessing as mp

with mp.Pool() as p:
    result = p.apply_async(lambda x: x, (1,))

pickle.dumps(result) # Error

AsyncResult contains some lock objects internally and they are not serializable. AsyncResult内部包含一些锁定对象,并且它们不可序列化。 Let' move to the next problem: 让我们转到下一个问题:

while not q.empty():
    q.get()

If I'm not wrong, in the code above you want to call AsyncResult.get and not Queue.get . 如果我没看错,在上面的代码中,您要调用AsyncResult.get而不是Queue.get In this case you have to first get your object from the queue and then call the corresponding method on your object. 在这种情况下,您必须首先从队列中获取对象,然后在对象上调用相应的方法。 However this is not the case in your code, since AsyncResult is not serializable. 但是,由于AsyncResult不可序列化,因此在您的代码中情况并非如此。

As @Mehdi Sadeghi explained , AsyncResult objects can't be pickled, which multiprocessing.Queue s requires. 正如@Mehdi Sadeghi 解释的那样 ,无法对AsyncResult对象进行腌制,而multiprocessing.Queue则需要这样做。 However you don't need one here because the queue isn't being shared among the processes. 但是,这里不需要一个队列,因为队列没有在进程之间共享。 This mean you can just use a regular Queue . 这意味着您可以只使用常规Queue

from multiprocessing import Pool
#from multiprocessing import Queue
from queue import Queue


class Test(object):
    def __init__(self, num):
        self.num = num
        print('Test({!r}) created'.format(num))


if __name__ == '__main__':
    p = Pool()
    q = Queue()
    for i in range(5):
        proc = p.apply_async(Test, args=(i,))
        q.put(proc)
    p.close()
    while not q.empty():
        q.get()
    p.join()

    print('done')

Output: 输出:

Test(0)
Test(1)
Test(2)
Test(3)
Test(4)
done

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM