简体   繁体   English

python和多处理

[英]python and multiprocessing

I have three functions in the python that each one puts an image (image path) as input and makes a simple image processing and creates a new image (image path) as output. 我在python中有三个函数,每个函数将一个图像(图像路径)作为输入,并进行简单的图像处理,并创建一个新图像(图像路径)作为输出。

in the example below, one function depends on the other, ie: the function of alg2 takes as input the image that generates the function of alg and the function of alg3 assign as input the image that generates the function of alg2 which depends on the function of alg1. 在下面的示例中,一个功能依赖于另一个功能,即:alg2的功能将生成alg的功能的图像作为输入,而alg3的功能将生成alg2的功能的图像指定为输入,这取决于功能的alg1。

(I hope you do not mind basically) (希望您基本上不要介意)

because of their relatively high execution time (image processing is that) I would like to ask if I can to parallelize them using python multiprocessing. 由于它们的执行时间比较长(图像处理就是这样),我想问一下是否可以使用python多处理并行化它们。 I have read about multiprocessing map and pool but I was pretty confused . 我已经阅读了有关多处理映射和池的信息,但是我很困惑。

whenever I summarize I have three interdependent functions and I would like to run them together if done. 每当我总结时,我都有三个相互依赖的功能,如果完成的话,我想一起运行它们。 I would also like to know how I would perform these three functions in a contemporary way if they were not interdependent, ie each was autonomous. 我也想知道,如果这三个功能不是相互依赖的,即它们是自主的,我将如何以当代的方式执行这三个功能。

def alg1(input_path_image,output_path_image):
    start = timeit.default_timer()
    ###processing###)
    stop = timeit.default_timer()
    print stop - start
    return output_path_image

def alg1(output_path_image,output_path_image1):
    start = timeit.default_timer()
    ###processing###
    stop = timeit.default_timer()
    print stop - start
    return output_path_image1

def alg3(output_path_image1,output_path_image2):
    start = timeit.default_timer()
    ###processing###
    stop = timeit.default_timer()
    print stop - start
    return output_path_image2

if __name__ == '__main__':
   alg1(input_path_image,output_path_image)
   alg2(output_path_image,output_path_image1)
   alg3(output_path_image1,output_path_image2)

Here is what I would do: 这是我会做的:

I would split the list of images into smaller parts. 我将图像列表分成较小的部分。 Then I would make one function out of those three functions (by making the other 2 functions as private - just for the sake of simplicity). 然后,我将从这三个函数中选出一个(通过将其他两个函数设为私有-为简单起见)。 Then you can speed up the whole process by doing: 然后,您可以通过执行以下操作来加快整个过程:

from multiprocessing import Process

image_list = this_is_your_huge_image_list
# create smaller image lists e.g. [[1, 2, 3], [4, 5, 6], ..]
chunked_lists = [image_list[x:x+100] for x in xrange(0, len(image_list), 100)]

for img_list in chunked_lists:
    p = Process(target=your_main_func, args=(img_list,))
    p.start()
    # without .join() here

It sounds like you're doing something CPU intensive, so you'll need to use the multiprocessing.Process object, rather than threading.Thread . 听起来您正在做一些CPU密集型工作,因此您需要使用multiprocessing.Process对象,而不是threading.Thread Because of this, you can't return from multiprocessing.Process , and therefore will need to use a multiprocessing.Manager . 因此,您不能从multiprocessing.Process返回,因此将需要使用multiprocessing.Manager

So this is an adaptation of your code which will work with multiprocessing.Process : 因此,这是您的代码的改编版本,可与multiprocessing.Process

from multiprocessing import Process, Manager

def alg1(input_path_image,output_path_image, return_dict):
    start = timeit.default_timer()
    ###processing###)
    stop = timeit.default_timer()
    print stop - start
    return_dict['algo1'] = output_path_image

def alg2(output_path_image,output_path_image1, return_dict):
    start = timeit.default_timer()
    ###processing###
    stop = timeit.default_timer()
    print stop - start
    return_dict['algo2'] = output_path_image1

def alg3(output_path_image1,output_path_image2, return_dict):
    start = timeit.default_timer()
    ###processing###
    stop = timeit.default_timer()
    print stop - start
    return_dict['algo3'] = output_path_image2

if __name__ == '__main__':
    manager = Manager()
    return_dict = manager.dict()
    a1 = Process(target=alg1, args=(output_path_image,output_path_image, return_dict))
    a2 = Process(target=alg2, args=(output_path_image1,output_path_image1, return_dict))
    a3 = Process(target=alg3, args=(output_path_image2,output_path_image2, return_dict))
    jobs = [a1, a2, a3]
    for job in jobs:
        job.start()
    for job in jobs:
        job.join()
    a1_return = return_dict['algo1']
    a2_return = return_dict['algo2']
    a3_return = return_dict['algo3']

You'll need to modify this further to give your print statements a little more distinction. 您将需要对此进行进一步修改,以使您的打印语句有更多区别。 At the moment, they will only print a number, and you won't be able to distinguish between them. 目前,它们只会打印一个数字,您将无法区分它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM