[英]Python multiprocessing: reduce during map?
Is there a way to reduce memory consumption when working with Python's pool.map
?在使用 Python 的
pool.map
时,有没有办法减少内存消耗?
To give a short example: worker()
does some heavy lifting and returns a larger array...举一个简短的例子:
worker()
做了一些繁重的工作并返回一个更大的数组......
def worker():
# cpu time intensive tasks
return large_array
...and a Pool maps over some large sequence: ...和一个 Pool 映射到一些大序列:
with mp.Pool(mp.cpu_count()) as p:
result = p.map(worker, large_sequence)
Considering this setup, obviously, result
will allocate a large portion of the system's memory.考虑到这种设置,显然,
result
将分配系统内存的很大一部分。 However, the final operation on the result is:但是,对结果的最终操作是:
final_result = np.sum(result, axis=0)
Thus, NumPy
effectively does nothing else than reducing with a sum operation on the iterable:因此,
NumPy
除了对可迭代对象进行求和运算之外,实际上什么都不做:
final_result = reduce(lambda x, y: x + y, result)
This, of course, would make it possible to consume results of pool.map
as they come in and garbage-collecting them after reducing to eliminate the need of storing all the values first.当然,这可以在
pool.map
结果pool.map
使用它们,并在减少后对它们进行垃圾收集,以消除首先存储所有值的需要。
I could write some mp.queue
now where results go into and then write some queue-consuming worker that sums up the results but this would (1) require significantly more lines of code and (2) feel like a (potentially slower) hack-around to me rather than clean code.我现在可以在结果进入的地方写一些
mp.queue
然后写一些消耗队列的工作人员来总结结果,但这将 (1) 需要更多的代码行和 (2) 感觉就像一个(可能更慢)黑客 -围绕我而不是干净的代码。
Is there a way to reduce results returned by a mp.Pool
operation directly as they come in?有没有办法直接减少
mp.Pool
操作返回的结果?
The iterator mappers imap
and imap_unordered
seem to do the trick:迭代器映射器
imap
和imap_unordered
似乎可以解决问题:
#!/usr/bin/env python3
import multiprocessing
import numpy as np
def worker( a ):
# cpu time intensive tasks
large_array = np.ones((20,30))+a
return large_array
if __name__ == '__main__':
arraysum = np.zeros((20,30))
large_sequence = range(20)
num_cpus = multiprocessing.cpu_count()
with multiprocessing.Pool( processes=num_cpus ) as p:
for large_array in p.imap_unordered( worker, large_sequence ):
arraysum += large_array
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.