[英]How do I parallelize a simple loop in python?
I have a loop that crashes my RAM every time, and I would like the parallelize. 我有一个循环,每次都崩溃我的RAM,我想并行化。
I tried this code, but donesn't work: 我尝试了这段代码,但是没有用:
from joblib import Parallel, delayed
from Bio.Align.Applications import ClustalOmegaCommandline
def run(test):
im = process_image(Image.open(test['Path'][i]))
test_images.append(im)
if __name__ == "__main__":
test_images = []
test = range(len(test))
Parallel(n_jobs=len(test)(
delayed(run)(i) for i in len(test))
I got this error: 我收到了这个错误:
File "", line 16 delayed(run)(i) for i in len(test)) ^ SyntaxError: unexpected EOF while parsing 文件“”,第16行延迟(运行)(i)for i in len(test))^ SyntaxError:解析时意外的EOF
My loop: 我的循环:
test_images = []
for i in range(len(test)):
im = process_image(Image.open(test['Path'][i]))
test_images.append(im)
test_images = np.asarray(test_images)
I have tried several solutions, but I need a single database output. 我尝试了几种解决方案,但我需要一个数据库输出。
Can you try the following: 你能尝试以下方法吗?
def process_image(img_path):
img_obj = Image.open(img_path)
# your logic here
return im
def main():
image_dict = {}
with concurrent.futures.ProcessPoolExecutor() as executor:
for img_path, im in zip(test['Path'], executor.map(process_image, test['Path'])):
image_dict[img_path] = im
return image_dict
if __name__ == '__main__':
image_dict = main()
test_images = np.asarray(image_dict.values())
I am not sure, if parallelization is the answer to memory problems. 我不确定,并行化是否是内存问题的答案。
Do you need to store every image inside a list, which is stored in memory? 您是否需要将每个图像存储在列表中,该列表存储在内存中? Maybe just save the path and load it, when it is needed? 也许只需保存路径并在需要时加载它?
Or try out generators . 或试试发电机 。 There the values are generated lazy (only if they are needed), which results in fewer memory consumption. 这些值是惰性生成的(只有在需要时才会生成),从而减少内存消耗。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.