[英]memory leak when using multiprocessing
As in the title, I'm struggling with memory leak when using multiprocessing
. 就像标题中一样,在使用
multiprocessing
时,我正为内存泄漏而苦苦挣扎。 I know the question like this has been asked before, but I still cannot find the right solution for my problem. 我知道以前曾提出过这样的问题,但是我仍然找不到适合我问题的正确解决方案。
I have a list of RGB images ( 30.000
total). 我有一个RGB图像列表(共
30.000
个)。 I need to read each image, process all three RGB channels, then keep the result in the memory ( to be saved in 1
big file later ) 我需要读取每个图像,处理所有三个RGB通道,然后将结果保存在内存中( 稍后保存到
1
大文件中 )
I'm trying to use something like this: 我正在尝试使用这样的东西:
import multiprocessing as mp
import random
import numpy as np
# Define an output queue to store result
output = mp.Queue()
# define a example function
def read_and_process_image(id, output):
result = np.random.randint(256, size=(100, 100, 3)) #fake an image
output.put(result)
# Setup a list of processes that we want to run
processes = [mp.Process(target=read_and_process_image, args=(id, output)) for id in range(30000)]
# Run processes
for p in processes:
p.start()
# # Exit the completed processes
# for p in processes:
# p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
This code uses a lot of memory. 此代码占用大量内存。 This answer explained the problem, but I cannot find the way to apply it to my code.
这个答案说明了问题,但是我找不到将其应用于代码的方法。 Any suggestion?
有什么建议吗? Thanks!
谢谢!
Edit : I also try joblib
and the Pool
class, but the code won't use all the cores like I expected (I see no difference between using normal for
loop with these 2 cases) 编辑 :我也尝试
joblib
和Pool
类,但代码不会像我期望的那样使用所有内核(我发现在这2种情况下使用普通的for
循环之间没有区别)
I'd use a pool to limit the number of processes spawned. 我会使用一个池来限制产生的进程数。 I've written a demonstration relying on your code:
我根据您的代码编写了一个演示:
import multiprocessing as mp
import os
import numpy as np
# define a example function
def read_and_process_image(_id):
print("Process %d is working" % os.getpid())
return np.random.randint(256, size=(100, 100, 3))
# Setup a list of arguments that we want to run the function with
taskargs = [(_id) for _id in range(100)]
# open a pool of processes
pool = mp.Pool(max(1, mp.cpu_count() // 2))
# Run processes
results = pool.map(read_and_process_image, taskargs)
print(results)
I know arguemnts are not used, but I thought you'd want to see how to do it in case you do need it (also, I've changed id
to _id
since id
is a builtin). 我知道未使用strudmnts,但我认为您想看看该怎么做,以防万一您确实需要它(而且,由于
id
是内置的,所以我已将id
更改为_id
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.