简体   繁体   English

使用多处理时发生内存泄漏

[英]memory leak when using multiprocessing

As in the title, I'm struggling with memory leak when using multiprocessing . 就像标题中一样,在使用multiprocessing时,我正为内存泄漏而苦苦挣扎。 I know the question like this has been asked before, but I still cannot find the right solution for my problem. 我知道以前曾提出过这样的问题,但是我仍然找不到适合我问题的正确解决方案。

I have a list of RGB images ( 30.000 total). 我有一个RGB图像列表(共30.000个)。 I need to read each image, process all three RGB channels, then keep the result in the memory ( to be saved in 1 big file later ) 我需要读取每个图像,处理所有三个RGB通道,然后将结果保存在内存中( 稍后保存到1大文件中

I'm trying to use something like this: 我正在尝试使用这样的东西:

import multiprocessing as mp
import random
import numpy as np


# Define an output queue to store result
output = mp.Queue()

# define a example function
def read_and_process_image(id, output):
    result = np.random.randint(256, size=(100, 100, 3)) #fake an image
    output.put(result)

# Setup a list of processes that we want to run
processes = [mp.Process(target=read_and_process_image, args=(id, output)) for id in range(30000)]

# Run processes
for p in processes:
    p.start()

# # Exit the completed processes
# for p in processes:
#     p.join()

# Get process results from the output queue
results = [output.get() for p in processes]

print(results)

This code uses a lot of memory. 此代码占用大量内存。 This answer explained the problem, but I cannot find the way to apply it to my code. 这个答案说明了问题,但是我找不到将其应用于代码的方法。 Any suggestion? 有什么建议吗? Thanks! 谢谢!

Edit : I also try joblib and the Pool class, but the code won't use all the cores like I expected (I see no difference between using normal for loop with these 2 cases) 编辑 :我也尝试joblibPool类,但代码不会像我期望的那样使用所有内核(我发现在这2种情况下使用普通的for循环之间没有区别)

I'd use a pool to limit the number of processes spawned. 我会使用一个池来限制产生的进程数。 I've written a demonstration relying on your code: 我根据您的代码编写了一个演示:

import multiprocessing as mp
import os
import numpy as np

# define a example function
def read_and_process_image(_id):
    print("Process %d is working" % os.getpid())
    return np.random.randint(256, size=(100, 100, 3))

# Setup a list of arguments that we want to run the function with
taskargs = [(_id) for _id in range(100)]

# open a pool of processes
pool = mp.Pool(max(1, mp.cpu_count() // 2))
# Run processes
results = pool.map(read_and_process_image, taskargs)

print(results)

I know arguemnts are not used, but I thought you'd want to see how to do it in case you do need it (also, I've changed id to _id since id is a builtin). 我知道未使用strudmnts,但我认为您想看看该怎么做,以防万一您确实需要它(而且,由于id是内置的,所以我已将id更改为_id )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM