繁体   English   中英

为什么我不能将 multiprocessing.Queue 与 ProcessPoolExecutor 一起使用?

[英]Why I can't use multiprocessing.Queue with ProcessPoolExecutor?

当我运行以下代码时:

from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue

q = Queue()

def my_task(x, queue):
    queue.put("Task Complete")
    return x

with ProcessPoolExecutor() as executor:
    tasks = [executor.submit(my_task, i, q) for i in range(10)]
    for task in as_completed(tasks):
        print(task.result())

我收到此错误:

concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 58, in __getstate__
    context.assert_spawning(self)
  File "/usr/lib/python3.10/multiprocessing/context.py", line 373, in assert_spawning
    raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/nn.py", line 14, in <module>
    print(task.result())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 58, in __getstate__
    context.assert_spawning(self)
  File "/usr/lib/python3.10/multiprocessing/context.py", line 373, in assert_spawning
    raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance

如果我不能用于多处理,那么 multiprocessing.Queue 的目的是什么? 我怎样才能让它工作? 在我的实际代码中,我需要每个工作人员经常更新有关任务状态的队列,以便另一个线程将从该队列获取数据以提供进度条。

简短说明

为什么不能将multiprocessing.Queue作为 worker function 参数传递? 简短的回答是提交的任务被提交到一个透明的输入队列,池进程从中获得下一个要执行的任务。 但是这些 arguments 必须可以使用picklemultiprocessing.Queue进行序列化。Queue 通常不是可序列化的。 但对于将参数作为 function 参数传递给子进程的特殊情况,它是可序列化的。 Arguments 到multiprocessing.Process在创建时存储为实例的属性。 当在实例上调用start时,它的 state 必须在新地址空间中调用run方法之前序列化到新地址空间。 我不清楚为什么这种序列化适用于这种情况而不适用于一般情况; 我将不得不花费大量时间查看解释器的源代码以得出明确的答案。

看看当我尝试将队列实例放入队列时会发生什么:

>>> from multiprocessing import Queue
>>> q1 = Queue()
>>> q2 = Queue()
>>> q1.put(q2)
>>> Traceback (most recent call last):
  File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "C:\Program Files\Python38\lib\multiprocessing\reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 58, in __getstate__
    context.assert_spawning(self)
  File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 359, in assert_spawning
    raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance

>>> import pickle
>>> b = pickle.dumps(q2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 58, in __getstate__
    context.assert_spawning(self)
  File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 359, in assert_spawning
    raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
>>>

如何通过 Inheritance 传递队列

首先,如果您刚刚在循环中调用my_task ,那么使用 multiprocessing 的代码运行速度会更慢,因为 multiprocessing 会引入额外的开销(进程的启动和跨地址空间移动数据),这需要您从并行运行my_task中获得的收益超过抵消了额外的开销。 在您的情况下,这不是因为my_task没有足够的 CPU 密集型来证明多处理的合理性。

也就是说,当您希望池进程使用multiprocessing.Queue实例时,它不能作为参数传递给 worker function(与显式使用multiprocessing.Process实例而不是池的情况不同)。 相反,您必须使用队列实例在每个池进程中初始化一个全局变量。

如果您在使用fork创建新进程的平台下运行,那么您只需将queue创建为全局队列,它将被每个池进程继承:

from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue

queue = Queue()

def my_task(x):
    queue.put("Task Complete")
    return x

with ProcessPoolExecutor() as executor:
    tasks = [executor.submit(my_task, i) for i in range(10)]
    for task in as_completed(tasks):
        print(task.result())
    # This queue must be read before the pool terminates:
    for _ in range(10):
        print(queue.get())

印刷:

1
0
2
3
6
5
4
7
8
9
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete

如果您需要不使用fork方法创建进程的平台的可移植性,例如 Windows(它使用spawn方法),那么您不能将队列分配为全局队列,因为每个池进程都会创建自己的队列实例。 相反,主进程必须创建队列,然后使用初始化程序initargs初始化每个池进程的全局queue变量:

from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue

def init_pool_processes(q):
    global queue

    queue = q

def my_task(x):
    queue.put("Task Complete")
    return x

# Windows compatibilitY
if __name__ == '__main__':
    q = Queue()

    with ProcessPoolExecutor(initializer=init_pool_processes, initargs=(q,)) as executor:
        tasks = [executor.submit(my_task, i) for i in range(10)]
        for task in as_completed(tasks):
            print(task.result())
        # This queue must be read before the pool terminates:
        for _ in range(10):
            print(q.get())

如果你想在每个任务完成时推进一个进度条(你没有准确说明进度条是如何推进的;请参阅我对你的问题的评论),那么下面显示了一个队列是必要的。 如果提交的每个任务都包含 N 个部分(总共有 10 * N 个部分,因为有 10 个任务)并且希望在每个部分完成时看到一个进度条前进,那么队列可能是最直接的方法向主进程发信号通知部分完成。

from concurrent.futures import ProcessPoolExecutor, as_completed
from tqdm import tqdm

def my_task(x):
    return x

# Windows compatibilitY
if __name__ == '__main__':
    with ProcessPoolExecutor() as executor:
        with tqdm(total=10) as bar:
            tasks = [executor.submit(my_task, i) for i in range(10)]
            for _ in as_completed(tasks):
                bar.update()
            # To get the results in task submission order:
            results = [task.result() for task in tasks]
    print(results)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM