防止RAM空间被重复填满 Python function

Question

I have a Python function example below which simply takes in a variable and performs a simple mathematical operation on it before returning.我在下面有一个 Python function 示例，它简单地接受一个变量并在返回之前对其执行简单的数学运算。

If I parallelise this function, to better reflect the operation I would like to do in real life, and run the parallelised function 10 times, I notice on my IDE that the memory increases despite using the del results line. If I parallelise this function, to better reflect the operation I would like to do in real life, and run the parallelised function 10 times, I notice on my IDE that the memory increases despite using the del results line.

import multiprocessing as mp
import numpy as np
from tqdm import tqdm

def function(x):
        return x*2

test_array = np.arange(0,1e4,1)

for i in range(10):

        pool = mp.Pool(processes=4)
        results = list(tqdm(pool.imap(function,test_array),total=len(test_array)))
        results = [x for x in results if str(x) != 'nan']

        del results

I have a few questions I would be grateful to know the answers to:我有几个问题，如果知道答案，我将不胜感激：

Is there a way to prevent this memory increase?有没有办法防止这种 memory 增加？
Is this memory loading due to the parallelisation process?由于并行化过程，这是 memory 加载吗？

Answer 1

Each new process that pool.imap creates needs to receive some information about the function and the element it applies the function too. pool.imap 创建的每个新进程都需要接收有关 function 及其应用 function 的元素的一些信息。 This information is copies, and will therefore cause information to be copies.此信息是副本，因此将导致信息成为副本。

If you want to reduce it, you might want to look at the chunksize argument of pool.imap.如果你想减少它，你可能想看看 pool.imap 的 chunksize 参数。

An other way would be to just rely on functions from numpy.另一种方法是仅依赖 numpy 中的函数。 You might already now, but you could just do results = test_array * 2 .你现在可能已经，但你可以做results = test_array * 2 。 I don't know how your real life example looks like, but you might not need to use Python's pool.我不知道您的真实示例如何，但您可能不需要使用 Python 的池。

Also, if you intend to actually write fast code, don't use tqdm.此外，如果您打算实际编写快速代码，请不要使用 tqdm。 It is nice and if you need it, you need it, but it will slow down your code.这很好，如果你需要它，你需要它，但它会减慢你的代码。

Answer 2

I haven't tried this out, but i'm quite sure you don't need to define我还没有尝试过，但我很确定你不需要定义

pool= mp.Pool(processes=4)

Within the loop, you're starting up 10 instances of the pool for no reason.在循环中，您无缘无故地启动了 10 个池实例。 Maybe try moving that out and seeing if your memory usage decreases?也许尝试将其移出并查看您的 memory 使用量是否减少？

If that doesn't help, consider restructuring your code to utilize yield instead to prevent your memory from filling up.如果这没有帮助，请考虑重组代码以利用yield来防止 memory 填满。

防止RAM空间被重复填满 Python function

问题描述

2 个解决方案

解决方案1
1 2019-10-09 15:15:54

解决方案2
0 2019-10-09 15:12:28

防止RAM空间被重复填满 Python function

问题描述

2 个解决方案

解决方案1 1 2019-10-09 15:15:54

解决方案2 0 2019-10-09 15:12:28

解决方案1
1 2019-10-09 15:15:54

解决方案2
0 2019-10-09 15:12:28