如何在Python中最佳优化在NxM网格上迭代的计算

Question

Working in Python, I am doing some physics calculations over an NxM grid of values, where N goes from 1 to 3108 and M goes from 1 to 2304 (this corresponds to a large image). 在Python中工作时，我正在NxM值网格上进行一些物理计算，其中N从1到3108，M从1到2304（这相当于一张大图片）。 I need calculate a value at each and every point in this space, which totals ~ 7 million calculations. 我需要在该空间的每个点计算一个值，总计约700万次计算。 My current approach is painfully slow, and I am wondering if there is a way to complete this task and it not take hours... 我当前的方法非常缓慢，我很想知道是否有一种方法可以完成此任务，并且不需要花费数小时...

My first approach was just to use nested for loops, but this seemed like the least efficient way to solve my problem. 我的第一种方法只是使用嵌套的for循环，但这似乎是解决我的问题的效率最低的方法。 I have tried using NumPy's nditer and iterating over each axis individually, but I've read that it doesn't actually speed up my computations. 我尝试使用NumPy的nditer并逐个遍历每个轴，但是我读到它实际上并没有加快我的计算速度。 Rather than looping through each axis individually, I also tried making a 3-D array and looping through the outer axis as shown in Brian's answer here How can I, in python, iterate over multiple 2d lists at once, cleanly? 而不是单独遍历每个轴，我还尝试制作3-D数组并遍历外轴，如Brian的答案所示。在Python中，我如何一次干净地遍历多个2d列表？ . 。 Here is the current state of my code: 这是我的代码的当前状态：

import numpy as np
x,y = np.linspace(1,3108,num=3108),np.linspace(1,2304,num=2304) # x&y dimensions of image
X,Y = np.meshgrid(x,y,indexing='ij')
all_coords = np.dstack((X,Y)) # moves to 3-D
all_coords = all_coords.astype(int) # sets coords to int

For reference, all_coords looks like this: 作为参考，all_coords如下所示：

array([[[1.000e+00, 1.000e+00],
        [1.000e+00, 2.000e+00],
        [1.000e+00, 3.000e+00],
        ...,
        [1.000e+00, 2.302e+03],
        [1.000e+00, 2.303e+03],
        [1.000e+00, 2.304e+03]],

       [[2.000e+00, 1.000e+00],
        [2.000e+00, 2.000e+00],
        [2.000e+00, 3.000e+00],
        ...,
        [2.000e+00, 2.302e+03],
        [2.000e+00, 2.303e+03],
        [2.000e+00, 2.304e+03]],

and so on. 等等。 Back to my code... 回到我的代码...

'''
- below is a function that does a calculation on the full grid using the distance between x0,y0 and each point on the grid.
- the function takes x0,y0 and returns the calculated values across the grid
'''
def do_calc(x0,y0):
    del_x, del_y = X-x0, Y-y0
    np.seterr(divide='ignore', invalid='ignore')
    dmx_ij = (del_x/((del_x**2)+(del_y**2))) # x component
    dmy_ij = (del_y/((del_x**2)+(del_y**2))) # y component
    return dmx_ij,dmy_ij

# now the actual loop

def do_loop():
    dmx,dmy = 0,0
    for pair in all_coords:
        for xi,yi in pair:
            DM = do_calc(xi,yi)
            dmx,dmy = dmx+DM[0],dmy+DM[1]
    return dmx,dmy

As you might see, this code takes an incredibly long time to run... If there is any way to modify my code such that it doesn't take hours to complete, I would be extremely interested in knowing how to do that. 如您所见，此代码需要花费非常长的时间来运行...如果有任何方法可以修改我的代码，而无需花费数小时才能完成，那么我将对如何做到这一点非常感兴趣。 Thanks in advance for the help. 先谢谢您的帮助。

Answer 1

Here is a method that gives a 10,000x speedup at N=310, M=230 . 这是在N=310, M=230提供10,000倍加速的方法。 As the method scales better than the original code I'd expect a factor of more than a million at the full problem size. 由于该方法的扩展性好于原始代码，因此我希望在整个问题范围内，该方法的扩展系数超过一百万。

The method exploits the shift invariance of the problem. 该方法利用了问题的转移不变性。 For example, del_x**2 is essentially the same up to shift at each call of do_calc , so we compute it only once. 例如，每次调用do_calc ， del_x**2本质上是相同的，因此我们只计算一次。

If the output of do_calc is weighted before summation the problem is no longer fully translation invariant, and this method doesn't work anymore. 如果在求和之前对do_calc的输出进行加权，则问题将不再是完全平移不变的，并且此方法不再起作用。 The result, however, can then be expressed in terms of linear convolution. 但是，结果可以用线性卷积表示。 At N=310, M=230 this still leaves us with a more than 1,000x speedup. 在N=310, M=230情况下，我们仍然可以得到超过1,000倍的加速比。 And, again, this will be more at full problem size 而且，在整个问题规模上，这将更多

Code for original problem 原始问题的代码

import numpy as np

#N, M = 3108, 2304
N, M = 310, 230

### OP's code

x,y = np.linspace(1,N,num=N),np.linspace(1,M,num=M) # x&y dimensions of image
X,Y = np.meshgrid(x,y,indexing='ij')
all_coords = np.dstack((X,Y)) # moves to 3-D
all_coords = all_coords.astype(int) # sets coords to int

'''
- below is a function that does a calculation on the full grid using the distance between x0,y0 and each point on the grid.
- the function takes x0,y0 and returns the calculated values across the grid
'''
def do_calc(x0,y0):
    del_x, del_y = X-x0, Y-y0
    np.seterr(divide='ignore', invalid='ignore')
    dmx_ij = (del_x/((del_x**2)+(del_y**2))) # x component
    dmy_ij = (del_y/((del_x**2)+(del_y**2))) # y component
    return np.nan_to_num(dmx_ij), np.nan_to_num(dmy_ij)

# now the actual loop

def do_loop():
    dmx,dmy = 0,0
    for pair in all_coords:
        for xi,yi in pair:
            DM = do_calc(xi,yi)
            dmx,dmy = dmx+DM[0],dmy+DM[1]
    return dmx,dmy

from time import time

t = [time()]

### pp's code

x, y = np.ogrid[-N+1:N-1:2j*N - 1j, -M+1:M-1:2j*M - 1J]
den = x*x + y*y
den[N-1, M-1] = 1
xx = x / den
yy = y / den
for zz in xx, yy:
    zz[N:] -= zz[:N-1]
    zz[:, M:] -= zz[:, :M-1]
XX = xx.cumsum(0)[N-1:].cumsum(1)[:, M-1:]
YY = yy.cumsum(0)[N-1:].cumsum(1)[:, M-1:]
t.append(time())

### call OP's code for reference

X_OP, Y_OP = do_loop()
t.append(time())

# make sure results are equal

assert np.allclose(XX, X_OP)
assert np.allclose(YY, Y_OP)
print('pp {}\nOP {}'.format(*np.diff(t)))

Sample run: 样品运行：

pp 0.015251636505126953
OP 149.1642508506775

Code for weighted problem: 加权问题的代码：

import numpy as np

#N, M = 3108, 2304
N, M = 310, 230

values = np.random.random((N, M))
x,y = np.linspace(1,N,num=N),np.linspace(1,M,num=M) # x&y dimensions of image
X,Y = np.meshgrid(x,y,indexing='ij')
all_coords = np.dstack((X,Y)) # moves to 3-D
all_coords = all_coords.astype(int) # sets coords to int

'''
- below is a function that does a calculation on the full grid using the distance between x0,y0 and each point on the grid.
- the function takes x0,y0 and returns the calculated values across the grid
'''
def do_calc(x0,y0, v):
    del_x, del_y = X-x0, Y-y0
    np.seterr(divide='ignore', invalid='ignore')
    dmx_ij = (del_x/((del_x**2)+(del_y**2))) # x component
    dmy_ij = (del_y/((del_x**2)+(del_y**2))) # y component
    return v*np.nan_to_num(dmx_ij), v*np.nan_to_num(dmy_ij)

# now the actual loop

def do_loop():
    dmx,dmy = 0,0
    for pair, vv in zip(all_coords, values):
        for (xi,yi), v in zip(pair, vv):
            DM = do_calc(xi,yi, v)
            dmx,dmy = dmx+DM[0],dmy+DM[1]
    return dmx,dmy

from time import time
from scipy import signal

t = [time()]
x, y = np.ogrid[-N+1:N-1:2j*N - 1j, -M+1:M-1:2j*M - 1J]
den = x*x + y*y
den[N-1, M-1] = 1
xx = x / den
yy = y / den
XX, YY = (signal.fftconvolve(zz, values, 'valid') for zz in (xx, yy))

t.append(time())
X_OP, Y_OP = do_loop()
t.append(time())
assert np.allclose(XX, X_OP)
assert np.allclose(YY, Y_OP)
print('pp {}\nOP {}'.format(*np.diff(t)))

Sample run: 样品运行：

pp 0.12683939933776855
OP 158.35225439071655

如何在Python中最佳优化在NxM网格上迭代的计算

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-03-31 00:57:48

如何在Python中最佳优化在NxM网格上迭代的计算

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-03-31 00:57:48

解决方案1
1 已采纳 2019-03-31 00:57:48