简体   繁体   English

如何在 Python 中对 3 个参数 function 使用多处理

[英]How to use Multiprocessing for 3 argument function in Python

I have 3 types of files each of the same size ( around 500 files of each type).我有 3 种大小相同的文件(每种类型大约 500 个文件)。 I have to give these files to a function.我必须将这些文件交给 function。 How can I use multiprocessing for the same?我怎样才能使用多处理? The files are rgb_image: 15.png,16.png,17.png.... depth_img: 15.png, 16.png, 17.png and mat:15.mat, 16.mat, 17.mat... I have to use 3 files 15.png, 15.png and 15.mat as argument to the function.这些文件是 rgb_image: 15.png,16.png,17.png.... depth_img: 15.png, 16.png, 17.png 和 mat:15.mat, 16.mat, 17.mat...我必须使用 3 个文件 15.png、15.png 和 15.mat 作为 function 的参数。 Starting names of files can vary but it is of this format.文件的起始名称可能会有所不同,但它是这种格式。

The code is as follows:代码如下:

def depth_rgb_registration(rgb, depth, mat):
     required operation is performed here and
     gait_list ( a list is the output of this function)



def display_fun(mat, selected_depth, selected_color, excel):

    for idx, color_img in enumerate(color_lists):   
        for i in range(len(depth_lists)):
            if color_img.split('.')[0] == depth_lists[i].split('.')[0]:
                rgb = os.path.join(selected_color, color_img)
                depth = os.path.join(selected_depth, sorted(depth_lists)[i])
                m = sorted(mat_lists)[idx]
                mat2 = os.path.join(mat, m)

                abc = color_img.split('.')[0]
                gait_list1 = []

                fnum = int("".join([str(i) for i in re.findall("(\d+)", abc)]))

                gait_list1.append(fnum)
                depth_rgb_registration(rgb, depth,mat2)
                gait_list2.append(gait_list1) #Output gait_list1 from above function
                data1 = pd.DataFrame(gait_list2)
                data1.to_excel(writer, index=False)
                wb.save(excel)

In the above code, we have display_fun which is the main function, which is called from the other code.在上面的代码中,我们有 display_fun,它是主要的 function,它是从其他代码调用的。 In this function, we have color_img, depth_imp, and mat which are three different types of files from the folders.在这个 function 中,我们有 color_img、depth_imp 和 mat,它们是文件夹中的三种不同类型的文件。 These three files are given as arguments to depth_rgb_registration function.这三个文件作为 arguments 到 depth_rgb_registration function 给出。 In this function, some required values are stored in gait_list1 which is then stored in an excel file for every set of files.在这个 function 中,一些必需的值存储在 gait_list1 中,然后存储在 excel 文件中,用于每组文件。

This loop above is working but it takes around 20-30 minutes to run depending on the number of files.上面的这个循环是有效的,但是根据文件的数量,运行大约需要 20-30 分钟。 So I wanted to use Multiprocessing and reduce the overall time.所以我想使用 Multiprocessing 并减少总时间。

I tried multiprocessing by seeing some example but I am not able to understand how can I give these 3 files as an argument.我通过查看一些示例尝试了多处理,但我无法理解如何将这 3 个文件作为参数。 I know using a dictionary here is not correct which I have used below, but what can be an alternative?我知道在这里使用字典是不正确的,我在下面使用过,但是有什么可以替代的呢? Even if it is asynchronous multiprocessing, it is fine.即使是异步多处理,也没问题。 I even thought of using GPU to run the function, but as I read, extra time will go in the loading of the data to GPU. I even thought of using GPU to run the function, but as I read, extra time will go in the loading of the data to GPU. Any suggestions?有什么建议么?

def display_fun2(mat, selected_depth, selected_color, results, excel):

    path3 = selected_depth
    path4 = selected_color
    path5 = mat

    rgb_depth_pairs = defaultdict(list)

    for rgb in path4.iterdir():
        rgb_depth_pairs[rgb.stem].append(rgb)

    included_extensions = ['png']
    images = [fn for ext in included_extensions for fn in path3.glob(f'*.{ext}')]

    for image in images:
        rgb_depth_pairs[image.stem].append(image)

    for mat in path5.iterdir():
        rgb_depth_pairs[mat.stem].append(mat)

    rgb_depth_pairs = [item for item in rgb_depth_pairs.items() if len(item) == 3]

    with Pool() as p:
        p.starmap_async(process_pairs, rgb_depth_pairs) 

    gait_list2.append(gait_list1)
    data1 = pd.DataFrame(gait_list2)
    data1.to_excel(writer, index=False)
    wb.save(excel)



def depth_rgb_registration(rgb, depth, mat):
      required operation for one set of files

I did not look at the code in detail (it was too long), but provided that the combinations of arguments that will be sent to your function with 3 arguments can be evaluated independently (outside of the function itself), you can simply use Pool.starmap : I did not look at the code in detail (it was too long), but provided that the combinations of arguments that will be sent to your function with 3 arguments can be evaluated independently (outside of the function itself), you can simply use Pool.starmap

For example:例如:

from multiprocessing import Pool

def myfunc(a, b, c):
    return 100*a + 10*b + c

myargs = [(2,3,1), (1,2,4), (5,3,2), (4,6,1), (1,3,8), (3,4,1)]

p = Pool(2)

print(p.starmap(myfunc, myargs))

returns:返回:

[231, 124, 532, 461, 138, 341]

Alternatively, if your function can be recast as a function which accepts a single argument (the tuple) and expands from this into the separate variables that it needs, then you can use Pool.map :或者,如果您的 function 可以重铸为 function ,它接受单个参数(元组)并将其扩展为所需的单独变量,那么您可以使用Pool.map

def myfunc(t):
    a, b, c = t  # unpack the tuple and carry on
    return 100*a + 10*b + c

...

print(p.map(myfunc, myargs))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM