如何通过矢量化加速 python 中的这个 DP function

Question

So I have this definition here,所以我在这里有这个定义，

DP[i,j] = f[i,j] + min(DP[i−1, j −1], DP[i−1, j], DP[i−1, j +1])

which defines the minimum accrued cost to go from the top of the NxM matrix to the bottom of the matrix.它定义了从 NxM 矩阵顶部到矩阵底部的 go 的最小应计成本。 Each cell in f represents a value/cost (1.2, 0, 10, etc.) to travel to that cell from another cell. f 中的每个单元格代表从另一个单元格前往该单元格的价值/成本（1.2、0、10 等）。

The matrix may be large (1500x1500, It's Gradient map of an image ), and the DP algorithm I programmed came out to be about a second per run for my matrices.矩阵可能很大（1500x1500，它是图像的梯度 map ），我为我的矩阵编写的 DP 算法每次运行大约需要一秒。 This matrix needs to run hundreds of times per execution, so total program run time comes out to be several minutes long.该矩阵每次执行需要运行数百次，因此总程序运行时间长达几分钟。 This loop is about 99% of my bottleneck, so I am trying to optimize this loop with Python/numpys vectorization methods.这个循环大约是我瓶颈的 99%，所以我正在尝试使用 Python/numpys 矢量化方法优化这个循环。 I only have access to Numpy, and Scipy.我只能访问 Numpy 和 Scipy。

Note: I don't program in python hardly at all, so the solution may just be obvious idk.注意：我几乎没有在 python 中编程，所以解决方案可能只是显而易见的 idk。

First attempt, Just the straightforward loop, time here is about 2-2.5 seconds per run第一次尝试，只是简单的循环，这里的时间大约是每次运行 2-2.5 秒

DP = f.copy()
for r in range(2, len(DP) - 1): # Start at row 2 since row one doesn't change
    for c in range(1, len(DP[0]) - 1):
        DP[r][c] += min(DP[r - 1, c-1:c+2])

Second attempt, I tried to leverage some numpy vectorizations functions "fromiter" to calculate entire rows at a time rather than column by column, time here is about 1-1.5 seconds per run.第二次尝试，我尝试利用一些 numpy 矢量化函数“fromiter”来一次计算整行而不是逐列，这里的时间大约是每次运行 1-1.5 秒。 My goal is to get this at least an order of magnitude faster, but I am stumped on how else I can optimize this.我的目标是让这个速度至少快一个数量级，但我对如何优化它感到困惑。

DP = f.copy()
for r in range(2, len(DP) - 1):
    def foo(arr):
        idx, val = arr
        if idx == 0 or idx == len(DP[[0]) - 1:
            return np.inf
        return val + min(DP[r - 1, idx - 1], DP[r - 1, idx], DP[r - 1, idx + 1])


    DP[r, :] = np.fromiter(map(foo, enumerate(DP[r, :])))

Answer 1

As hpaulj stated, being your problem inherently sequential it will be hard to fully vectorize, although it seems possible (every cell is updated based on values of the row r=2 , the difference is the considered number of triplets from row 2 for each of the following rows) so perhaps you can find a smart way to do it!正如 hpaulj 所说，由于您的问题本质上是连续的，因此很难完全矢量化，尽管这似乎是可能的（每个单元格都根据行r=2的值进行更新，不同之处在于每个单元格从第 2 行考虑的三元组数以下行）所以也许你可以找到一个聪明的方法来做到这一点！

That being said, a quick and half-vectorized solution would be to use the neat way of performing sliding windows with fancy indexing proposed by user42541, so we replace the inner loop with a vectorized call:话虽如此，一个快速和半矢量化的解决方案是使用 user42541 提出的精美索引执行滑动 windows的简洁方法，因此我们用矢量化调用替换内部循环：

indexer = np.arange(3)[:,None] + np.arange(DP.shape[1] - 2)[None,:]
for r in range(2, DP.shape[0] - 1):
    DP[r,1:-1] += np.min(DP[r-1,indexer], axis = 0)

This results in a speed-up relative to your double loop method (your vectorized solution didn't work in my pc) of about two orders of magnitude for a 1500x1500 array of integers.对于 1500x1500 的整数数组，这导致相对于您的双循环方法（您的矢量化解决方案在我的电脑中不起作用）大约两个数量级的加速。

如何通过矢量化加速 python 中的这个 DP function

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-03-21 15:28:59

如何通过矢量化加速 python 中的这个 DP function

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-03-21 15:28:59

解决方案1
2 已采纳 2021-03-21 15:28:59