Python 多处理：在映射期间减少？

Question

Is there a way to reduce memory consumption when working with Python's pool.map ?在使用 Python 的pool.map时，有没有办法减少内存消耗？

To give a short example: worker() does some heavy lifting and returns a larger array...举一个简短的例子： worker()做了一些繁重的工作并返回一个更大的数组......

def worker():
    # cpu time intensive tasks
    return large_array

...and a Pool maps over some large sequence: ...和一个 Pool 映射到一些大序列：

with mp.Pool(mp.cpu_count()) as p:
    result = p.map(worker, large_sequence)

Considering this setup, obviously, result will allocate a large portion of the system's memory.考虑到这种设置，显然， result将分配系统内存的很大一部分。 However, the final operation on the result is:但是，对结果的最终操作是：

    final_result = np.sum(result, axis=0)

Thus, NumPy effectively does nothing else than reducing with a sum operation on the iterable:因此， NumPy除了对可迭代对象进行求和运算之外，实际上什么都不做：

    final_result = reduce(lambda x, y: x + y, result)

This, of course, would make it possible to consume results of pool.map as they come in and garbage-collecting them after reducing to eliminate the need of storing all the values first.当然，这可以在pool.map结果pool.map使用它们，并在减少后对它们进行垃圾收集，以消除首先存储所有值的需要。

I could write some mp.queue now where results go into and then write some queue-consuming worker that sums up the results but this would (1) require significantly more lines of code and (2) feel like a (potentially slower) hack-around to me rather than clean code.我现在可以在结果进入的地方写一些mp.queue然后写一些消耗队列的工作人员来总结结果，但这将 (1) 需要更多的代码行和 (2) 感觉就像一个（可能更慢）黑客 -围绕我而不是干净的代码。

Is there a way to reduce results returned by a mp.Pool operation directly as they come in?有没有办法直接减少mp.Pool操作返回的结果？

Answer 1

The iterator mappers imap and imap_unordered seem to do the trick:迭代器映射器imap和imap_unordered似乎可以解决问题：

#!/usr/bin/env python3

import multiprocessing
import numpy as np

def worker( a ):
    # cpu time intensive tasks
    large_array = np.ones((20,30))+a
    return large_array


if __name__ == '__main__':
    
    arraysum = np.zeros((20,30))
    large_sequence = range(20)
    num_cpus = multiprocessing.cpu_count()    
    
    with multiprocessing.Pool( processes=num_cpus ) as p:
        for large_array in p.imap_unordered( worker, large_sequence ):
            arraysum += large_array

Python 多处理：在映射期间减少？

问题描述

1 个解决方案

解决方案1
0 2021-06-25 14:54:20

Python 多处理：在映射期间减少？

问题描述

1 个解决方案

解决方案1 0 2021-06-25 14:54:20

解决方案1
0 2021-06-25 14:54:20