![](/img/trans.png)
[英]Can a python Multiprocessing queue be passed to the child process?
[英]Python Multiprocessing: Adding to Queue Within Child Process
我想实现一个将数据存储到 Mongo 的文件爬虫。 我想使用multiprocessing
作为“移交”阻塞任务的一种方式,例如解压缩文件、文件爬行和上传到 Mongo。 有些任务依赖于其他任务(即,在抓取文件之前需要解压缩文件),所以我希望能够完成必要的任务并将新任务添加到同一个任务队列中。
以下是我目前拥有的:
import multiprocessing
class Worker(multiprocessing.Process):
def __init__(self, task_queue: multiprocessing.Queue):
super(Worker, self).__init__()
self.task_queue = task_queue
def run(self):
for (function, *args) in iter(self.task_queue.get, None):
print(f'Running: {function.__name__}({*args,})')
# Run the provided function with its parameters in child process
function(*args)
self.task_queue.task_done()
def foo(task_queue: multiprocessing.Queue) -> None:
print('foo')
# Add new task to queue from this child process
task_queue.put((bar, 1))
def bar(x: int) -> None:
print(f'bar: {x}')
def main():
# Start workers on separate processes
workers = []
manager = multiprocessing.Manager()
task_queue = manager.Queue()
for i in range(multiprocessing.cpu_count()):
worker = Worker(task_queue)
workers.append(worker)
worker.start()
# Run foo on child process using the queue as parameter
task_queue.put((foo, task_queue))
for _ in workers:
task_queue.put(None)
# Block until workers complete and join main process
for worker in workers:
worker.join()
print('Program completed.')
if __name__ == '__main__':
main()
预期行为:
Running: foo((<AutoProxy[Queue] object, typeid 'Queue' at 0x1b963548908>,))
foo
Running: bar((1,))
bar: 1
Program completed.
实际行为:
Running: foo((<AutoProxy[Queue] object, typeid 'Queue' at 0x1b963548908>,))
foo
Program completed.
我对多处理很陌生,所以任何帮助都将不胜感激。
正如@FrankYellin 指出的那样,这是因为在添加bar
之前None
被放入task_queue
中。
假设队列要么是非空的,要么在程序期间等待任务完成(在我的例子中是这样),可以使用队列上的join
方法。 根据 文档:
阻塞,直到队列中的所有项目都已被获取和处理。
每当将项目添加到队列中时,未完成任务的计数就会增加。 每当消费者线程调用 task_done() 以指示该项目已被检索并且所有工作已完成时,计数就会下降。 当未完成任务的计数降至零时,join() 会解除阻塞。
以下是更新后的代码:
import multiprocessing
class Worker(multiprocessing.Process):
def __init__(self, task_queue: multiprocessing.Queue):
super(Worker, self).__init__()
self.task_queue = task_queue
def run(self):
for (function, *args) in iter(self.task_queue.get, None):
print(f'Running: {function.__name__}({*args,})')
# Run the provided function with its parameters in child process
function(*args)
self.task_queue.task_done() # <-- Notify queue that task is complete
def foo(task_queue: multiprocessing.Queue) -> None:
print('foo')
# Add new task to queue from this child process
task_queue.put((bar, 1))
def bar(x: int) -> None:
print(f'bar: {x}')
def main():
# Start workers on separate processes
workers = []
manager = multiprocessing.Manager()
task_queue = manager.Queue()
for i in range(multiprocessing.cpu_count()):
worker = Worker(task_queue)
workers.append(worker)
worker.start()
# Run foo on child process using the queue as parameter
task_queue.put((foo, task_queue))
# Block until all items in queue are popped and completed
task_queue.join() # <---
for _ in workers:
task_queue.put(None)
# Block until workers complete and join main process
for worker in workers:
worker.join()
print('Program completed.')
if __name__ == '__main__':
main()
这似乎工作正常。 如果我发现任何新内容,我会更新此内容。 谢谢你们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.