[英]How to add a pool of processes available for a multiprocessing queue
I am following a preceding question here: how to add more items to a multiprocessing queue while script in motion 我在这里关注前面的问题: 在运行脚本时如何将更多项目添加到多处理队列中
the code I am working with now: 我现在使用的代码:
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print('Doing something fancy in {} for {}!'.format(proc_name, self.name))
def worker(q):
while True:
obj = q.get()
if obj is None:
break
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
queue.put(MyFancyClass('Frankie'))
# print(queue.qsize())
queue.put(None)
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
Right now, there's two items in the queue. 现在,队列中有两个项目。 if I replace the two lines with a list of, say 50 items....How do I initiate a POOL to allow a number of processes available.
如果我将两行替换为例如50个项目的列表...。如何启动POOL以允许进行许多处理。 for example:
例如:
p = multiprocessing.Pool(processes=4)
where does that go? 那去哪儿了? I'd like to be able run multiple items at once, especially if the items run for a bit.
我希望能够一次运行多个项目,尤其是当项目运行一会儿时。 Thanks!
谢谢!
As a rule, you either use Pool
or Process
(es) plus Queue
s. 通常,您可以使用
Pool
或 Process
加上Queue
。 Mixing both is a misuse; 两者混用是一种误用。 the
Pool
already uses Queue
s (or a similar mechanism) behind the scenes. Pool
已经在后台使用了Queue
(或类似的机制)。
If you want to do this with a Pool
, change your code to (moving code to main
function for performance and better resource cleanup than running in global scope): 如果要使用
Pool
来执行此操作,请将代码更改为(将代码移至main
功能以实现性能并比在全局范围内运行更好地清理资源):
def main():
myfancyclasses = [MyFancyClass('Fancy Dan'), ...] # define your MyFancyClass instances here
with multiprocessing.Pool(processes=4) as p:
# Submit all the work
futures = [p.apply_async(fancy.do_something) for fancy in myfancyclasses]
# Done submitting, let workers exit as they run out of work
p.close()
# Wait until all the work is finished
for f in futures:
f.wait()
if __name__ == '__main__':
main()
This could be simplified further at the expense of purity, with the .*map*
methods of Pool
, eg to minimize memory usage redefine main
as: 可以使用
Pool
的.*map*
方法进一步以纯度为代价来简化此操作,例如,以最小化内存使用,将main
重新定义为:
def main():
myfancyclasses = [MyFancyClass('Fancy Dan'), ...] # define your MyFancyClass instances here
with multiprocessing.Pool(processes=4) as p:
# No return value, so we ignore it, but we need to run out the result
# or the work won't be done
for _ in p.imap_unordered(MyFancyClass.do_something, myfancyclasses):
pass
Yes, technically either approach has a slightly higher overhead in terms of needing to serialize the return value you're not using so give it back to the parent process. 是的,从技术上讲,在需要序列化未使用的返回值方面,这两种方法的开销都会稍高一些,因此请将其返回给父进程。 But in practice, this cost is pretty low (since your function has no
return
, it's returning None
, which serializes to almost nothing). 但是在实践中,此开销非常低(由于您的函数没有
return
,因此返回None
,序列化为几乎没有内容)。 An advantage to this approach is that for printing to the screen, you generally don't want to do it from the child processes (since they'll end up interleaving output), and you can replace the print
ing with return
s to let the parent do the work, eg: 这种方法的优点是,要在屏幕上打印,通常不希望从子进程中进行打印(因为它们最终将交错输出),并且可以用
return
替换print
以使父母做这项工作,例如:
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
# Changed from print to return
return 'Doing something fancy in {} for {}!'.format(proc_name, self.name)
def main():
myfancyclasses = [MyFancyClass('Fancy Dan'), ...] # define your MyFancyClass instances here
with multiprocessing.Pool(processes=4) as p:
# Using the return value now to avoid interleaved output
for res in p.imap_unordered(MyFancyClass.do_something, myfancyclasses):
print(res)
if __name__ == '__main__':
main()
Note how all of these solutions remove the need to write your own worker
function, or manually manage Queue
s, because Pool
s do that grunt work for you. 注意所有这些解决方案如何消除编写自己的
worker
函数或手动管理Queue
的需要,因为Pool
可以为您完成繁重的工作。
Alternate approach using concurrent.futures
to efficiently process results as they become available, while allowing you to choose to submit new work (either based on the results, or based on external information) as you go: 一种替代方法,使用
concurrent.futures
。未来功能可在结果可用时有效地对其进行处理,同时允许您在进行过程中选择提交新工作(基于结果或基于外部信息):
import concurrent.futures
from concurrent.futures import FIRST_COMPLETED
def main():
allow_new_work = True # Set to False to indicate we'll no longer allow new work
myfancyclasses = [MyFancyClass('Fancy Dan'), ...] # define your initial MyFancyClass instances here
with concurrent.futures.ProcessPoolExecutor() as executor:
remaining_futures = {executor.submit(fancy.do_something)
for fancy in myfancyclasses}
while remaining_futures:
done, remaining_futures = concurrent.futures.wait(remaining_futures,
return_when=FIRST_COMPLETED)
for fut in done:
result = fut.result()
# Do stuff with result, maybe submit new work in response
if allow_new_work:
if should_stop_checking_for_new_work():
allow_new_work = False
# Let the workers exit when all remaining tasks done,
# and reject submitting more work from now on
executor.shutdown(wait=False)
elif has_more_work():
# Assumed to return collection of new MyFancyClass instances
new_fanciness = get_more_fanciness()
remaining_futures |= {executor.submit(fancy.do_something)
for fancy in new_fanciness}
myfancyclasses.extend(new_fanciness)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.