[英]Can I use a `multiprocessing.Queue` for communication within a process?
[英]Why I can't use multiprocessing.Queue with ProcessPoolExecutor?
当我运行以下代码时:
from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue
q = Queue()
def my_task(x, queue):
queue.put("Task Complete")
return x
with ProcessPoolExecutor() as executor:
tasks = [executor.submit(my_task, i, q) for i in range(10)]
for task in as_completed(tasks):
print(task.result())
我收到此错误:
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "/usr/lib/python3.10/multiprocessing/context.py", line 373, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/nn.py", line 14, in <module>
print(task.result())
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "/usr/lib/python3.10/multiprocessing/context.py", line 373, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
如果我不能用于多处理,那么 multiprocessing.Queue 的目的是什么? 我怎样才能让它工作? 在我的实际代码中,我需要每个工作人员经常更新有关任务状态的队列,以便另一个线程将从该队列获取数据以提供进度条。
简短说明
为什么不能将multiprocessing.Queue
作为 worker function 参数传递? 简短的回答是提交的任务被提交到一个透明的输入队列,池进程从中获得下一个要执行的任务。 但是这些 arguments 必须可以使用pickle
和multiprocessing.Queue
进行序列化。Queue 通常不是可序列化的。 但对于将参数作为 function 参数传递给子进程的特殊情况,它是可序列化的。 Arguments 到multiprocessing.Process
在创建时存储为实例的属性。 当在实例上调用start
时,它的 state 必须在新地址空间中调用run
方法之前序列化到新地址空间。 我不清楚为什么这种序列化适用于这种情况而不适用于一般情况; 我将不得不花费大量时间查看解释器的源代码以得出明确的答案。
看看当我尝试将队列实例放入队列时会发生什么:
>>> from multiprocessing import Queue
>>> q1 = Queue()
>>> q2 = Queue()
>>> q1.put(q2)
>>> Traceback (most recent call last):
File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 239, in _feed
obj = _ForkingPickler.dumps(obj)
File "C:\Program Files\Python38\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 359, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
>>> import pickle
>>> b = pickle.dumps(q2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 359, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
>>>
如何通过 Inheritance 传递队列
首先,如果您刚刚在循环中调用my_task
,那么使用 multiprocessing 的代码运行速度会更慢,因为 multiprocessing 会引入额外的开销(进程的启动和跨地址空间移动数据),这需要您从并行运行my_task
中获得的收益超过抵消了额外的开销。 在您的情况下,这不是因为my_task
没有足够的 CPU 密集型来证明多处理的合理性。
也就是说,当您希望池进程使用multiprocessing.Queue
实例时,它不能作为参数传递给 worker function(与显式使用multiprocessing.Process
实例而不是池的情况不同)。 相反,您必须使用队列实例在每个池进程中初始化一个全局变量。
如果您在使用fork创建新进程的平台下运行,那么您只需将queue
创建为全局队列,它将被每个池进程继承:
from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue
queue = Queue()
def my_task(x):
queue.put("Task Complete")
return x
with ProcessPoolExecutor() as executor:
tasks = [executor.submit(my_task, i) for i in range(10)]
for task in as_completed(tasks):
print(task.result())
# This queue must be read before the pool terminates:
for _ in range(10):
print(queue.get())
印刷:
1
0
2
3
6
5
4
7
8
9
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
如果您需要不使用fork方法创建进程的平台的可移植性,例如 Windows(它使用spawn方法),那么您不能将队列分配为全局队列,因为每个池进程都会创建自己的队列实例。 相反,主进程必须创建队列,然后使用初始化程序和initargs初始化每个池进程的全局queue
变量:
from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue
def init_pool_processes(q):
global queue
queue = q
def my_task(x):
queue.put("Task Complete")
return x
# Windows compatibilitY
if __name__ == '__main__':
q = Queue()
with ProcessPoolExecutor(initializer=init_pool_processes, initargs=(q,)) as executor:
tasks = [executor.submit(my_task, i) for i in range(10)]
for task in as_completed(tasks):
print(task.result())
# This queue must be read before the pool terminates:
for _ in range(10):
print(q.get())
如果你想在每个任务完成时推进一个进度条(你没有准确说明进度条是如何推进的;请参阅我对你的问题的评论),那么下面显示了一个队列是必要的。 如果提交的每个任务都包含 N 个部分(总共有 10 * N 个部分,因为有 10 个任务)并且希望在每个部分完成时看到一个进度条前进,那么队列可能是最直接的方法向主进程发信号通知部分完成。
from concurrent.futures import ProcessPoolExecutor, as_completed
from tqdm import tqdm
def my_task(x):
return x
# Windows compatibilitY
if __name__ == '__main__':
with ProcessPoolExecutor() as executor:
with tqdm(total=10) as bar:
tasks = [executor.submit(my_task, i) for i in range(10)]
for _ in as_completed(tasks):
bar.update()
# To get the results in task submission order:
results = [task.result() for task in tasks]
print(results)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.