[英]How to speed up the 'for' loop in a python function?
我有一個函數var
。 我想知道通過利用系統擁有的所有處理器、內核和 RAM 內存通過多處理/並行處理在此函數中快速運行 for 循環(對於多個坐標:xs 和 ys)的最佳可能方法。
是否可以使用Dask
模塊?
可以在此處找到pysheds
文檔。
import numpy as np
from pysheds.grid import Grid
xs = 82.1206, 72.4542, 65.0431, 83.8056, 35.6744
ys = 25.2111, 17.9458, 13.8844, 10.0833, 24.8306
for (x,y) in zip(xs,ys):
grid = Grid.from_raster('E:/data.tif', data_name='map')
grid.catchment(data='map', x=x, y=y, out_name='catch', recursionlimit=1500, xytype='label')
....
....
results
您沒有發布指向您的image1.tif
文件的鏈接,因此下面的示例代碼使用來自https://github.com/mdbartos/pysheds的pysheds/data/dem.tif
基本思想是拆分輸入參數, xs
和ys
在你的情況下,分成子集,然后給每個 CPU 一個不同的子集來處理。
main()
計算兩次解,一次是順序的,一次是並行的,然后比較每個解。 並行解決方案存在一些低效率,因為圖像文件將由每個 CPU 讀取,因此有改進的空間(即,讀取並行部分之外的圖像文件,然后將生成的grid
對象提供給每個實例)。
import numpy as np
from pysheds.grid import Grid
from dask.distributed import Client
from dask import delayed, compute
xs = 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
ys = 25, 35, 45, 55, 65, 75, 85, 95, 105, 115, 125
def var(image_file, x_in, y_in):
grid = Grid.from_raster(image_file, data_name='map')
variable_avg = []
for (x,y) in zip(x_in,y_in):
grid.catchment(data='map', x=x, y=y, out_name='catch')
variable = grid.view('catch', nodata=np.nan)
variable_avg.append( np.array(variable).mean() )
return(variable_avg)
def var_parallel(n_cpu, image_file, x_in, y_in):
tasks = []
for cpu in range(n_cpu):
x_in = xs[cpu::n_cpu] # eg, cpu = 0: x_in = (10, 40, 70, 100)
y_in = ys[cpu::n_cpu] #
tasks.append( delayed(var)(image_file, x_in, y_in) )
ans = compute(tasks)
# reassemble solution in the right order
par_avg = [None]*len(xs)
for cpu in range(n_cpu):
par_avg[cpu::n_cpu] = ans[0][cpu]
print('AVG (parallel) =',par_avg)
return par_avg
def main():
image_file = 'pysheds/data/dem.tif'
# sequential solution:
seq_avg = var(image_file, xs, ys)
print('AVG (sequential)=',seq_avg)
# parallel solution:
n_cpu = 3
dask_client = Client(n_workers=n_cpu)
par_avg = var_parallel(n_cpu, image_file, xs, ys)
dask_client.shutdown()
print('max error=',
max([ abs(seq_avg[i]-par_avg[i]) for i in range(len(seq_avg))]))
if __name__ == '__main__': main()
我嘗試使用dask
在下面提供可重現的代碼。 您可以添加pysheds
的主要處理部分或其中的任何其他函數,以便更快地並行迭代參數。
dask
模塊的文檔可以在這里找到。
import dask
from dask import delayed, compute
from dask.distributed import Client, progress
from pysheds.grid import Grid
client = Client(threads_per_worker=2, n_workers=2) #Choose the number of workers and threads per worker over here to deploy for your task.
xs = 82.1206, 72.4542, 65.0431, 83.8056, 35.6744
ys = 25.2111, 17.9458, 13.8844, 10.0833, 24.8306
#Firstly, a function has to be created, where the iteration of the parameters is involved.
def var(x,y):
grid = Grid.from_raster('data.tif', data_name='map')
grid.catchment(data='map', x=x, y=y, out_name='catch', recursionlimit=1500, xytype='label')
...
...
return (result)
#Now calling the function in a 'dask' way.
lazy_results = []
for (x,y) in zip(xs,ys):
lazy_result = dask.delayed(var)(x,y)
lazy_results.append(lazy_result)
#Final command to execute the function var(x,y) and get the result.
dask.compute(*lazy_results)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.