简体   繁体   English

在python中为多个参数并行运行单个函数的最快方法

[英]Fastest way to run a single function in python in parallel for multiple parameters

Suppose I have a single function processing .假设我有一个单一的功能processing I want to run the same function multiple times for multiple parameters parallelly instead of sequentially one after the other.我想为多个参数并行运行相同的函数多次,而不是一个接一个地依次运行。

def processing(image_location):
    
    image = rasterio.open(image_location)
    ...
    ...
    return(result)

#calling function serially one after the other with different parameters and saving the results to a variable.
results1 = processing(r'/home/test/image_1.tif')
results2 = processing(r'/home/test/image_2.tif')
results3 = processing(r'/home/test/image_3.tif')

For example, If I run delineation(r'/home/test/image_1.tif') then delineation(r'/home/test/image_2.tif') and then delineation(r'/home/test/image_3.tif') , as shown in the above code, it will run sequentially one after the other and if it takes 5 minutes for one function to run then running these three will take 5x3=15 minutes.例如,如果我运行delineation(r'/home/test/image_1.tif')然后delineation(r'/home/test/image_1.tif') delineation(r'/home/test/image_2.tif')然后delineation(r'/home/test/image_2.tif') delineation(r'/home/test/image_3.tif') ,如上面的代码所示,它会一个接一个地依次运行,如果一个函数运行需要5分钟,那么运行这三个函数需要5x3=15分钟。 Hence, I am wondering if I can run these three parallelly/embarrassingly parallel so that it takes only 5 minutes to execute the function for all the three different parameters.因此,我想知道我是否可以并行/尴尬地并行运行这三个,以便对所有三个不同参数执行该函数只需要 5 分钟。

Help me with the fastest way to do this job.帮助我以最快的方式完成这项工作。 The script should be able to utilize all the resources/CPU/ram available by default to do this task.该脚本应该能够利用默认情况下可用的所有资源/CPU/ram 来执行此任务。

You can use multiprocessing to execute functions in parallel and save results to results variable:您可以使用multiprocessing并行执行函数并将结果保存到results变量:

from multiprocessing.pool import ThreadPool

pool = ThreadPool()
images = [r'/home/test/image_1.tif', r'/home/test/image_2.tif', r'/home/test/image_3.tif']
results = pool.map(delineation, images)

You might want to take a look at IPython Parallel .您可能想看看IPython Parallel It allows you to easily run functions on a load-balanced (local) cluster.它允许您轻松地在负载平衡(本地)集群上运行函数。

For this little example, make sure you have IPython Parallel , NumPy and Pillow installed.对于这个小例子,确保你已经安装了IPython ParallelNumPyPillow To run the the example, you need first to launch the cluster.要运行该示例,您首先需要启动集群。 To launch a local cluster with four parallel engines, type into a terminal (one engine for one processor core seems a reasonable choice):要启动具有四个并行引擎的本地集群,请在终端中键入(一个处理器内核一个引擎似乎是一个合理的选择):

ipcluster 4

Then you can run the following script, which searches for jpg-images in a given directory and counts the number of pixels in each image:然后您可以运行以下脚本,该脚本在给定目录中搜​​索 jpg-images 并计算每个图像中的像素数:

import ipyparallel as ipp


rc = ipp.Client()
with rc[:].sync_imports():  # import on all engines
    import numpy
    from pathlib import Path
    from PIL import Image


lview = rc.load_balanced_view()  # default load-balanced view
lview.block = True  # block until map() is finished


@lview.parallel()
def count_pixels(fn: Path):
    """Silly function to count the number of pixels in an image file"""
    im = Image.open(fn)
    xx = numpy.asarray(im)
    num_pixels = xx.shape[0] * xx.shape[1]
    return fn.stem, num_pixels


pic_dir = Path('Pictures')
fn_lst = pic_dir.glob('*.jpg')  # list all jpg-files in pic_dir

results = count_pixels.map(fn_lst)  # execute in parallel

for n_, cnt in results:
    print(f"'{n_}' has {cnt} pixels.")

Another way of writing with the multiprocessing library (see @Alderven for a different function).使用multiprocessing库编写的另一种方式(请参阅@Alderven 了解不同的功能)。

import multiprocessing as mp

def calculate(input_args):
    result = input_args * 2
    return result

N = mp.cpu_count()
parallel_input = np.arange(0, 100)
print('Amount of CPUs ', N)
print('Amount of iterations ', len(parallel_input))

with mp.Pool(processes=N) as p:
    results = p.map(calculate, list(parallel_input))

The results variable will contain a list with your processed data. results变量将包含一个包含您处理过的数据的列表。 Which you are then able to write.然后你就可以写了。

I think one of the easiest methods is using joblib :我认为最简单的方法之一是使用joblib

import joblib

allJobs = []
allJobs.append(joblib.delayed(processing)(r'/home/test/image_1.tif'))
allJobs.append(joblib.delayed(processing)(r'/home/test/image_2.tif'))
allJobs.append(joblib.delayed(processing)(r'/home/test/image_3.tif'))

results = joblib.Parallel(n_jobs=joblib.cpu_count(), verbose=10)(allJobs)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python中的单个函数下并行运行多个for循环 - How to run multiple for loops in parallel under a single function in python 如何多次并行运行单个 python function - How to run a single python function in parallel multiple times 运行具有多个参数的 python 函数 - To run a python function with multiple parameters 在Python中从单个大图像创建多个缩略图的最快方法 - Fastest way to create multiple thumbnails from a single large image in Python 使用不同的参数并行运行相同的函数,并知道哪个并行运行在 python 中结束 - Run same function in parallel with different parameters and know which parallel run has ended in python Python并行运行功能 - Python run function parallel Python:多次运行相同函数的参数 - Python: Parameters to Run the Same Function Multiple Times 使用多个参数运行相同 function 的多处理 Python - Run Python Multiprocessing of same function with multiple parameters 将多个参数传递给python中的函数的方法 - Way to pass multiple parameters to a function in python 如何以不同的阵列作为输入并行运行多个 python function 对? - How to run multiple python function pairs in parallel with different array as input?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM