简体   繁体   中英

Speeding up dynamic programming in python/numpy

I have a 2D cost matrix M , perhaps 400x400, and I'm trying to calculate the optimal path through it. As such, I have a function like:

M[i,j] = M[i,j] + min(M[i-1,j-1],M[i-1,j]+P1,M[i,j-1]+P1)

which is obviously recursive. P1 is some additive constant. My code, which works more or less, is:

def optimalcost(cost, P1=10):
    width1,width2 = cost.shape
    M = array(cost)
    for i in range(0,width1):
       for j in range(0,width2):
          try:
              M[i,j] = M[i,j] + min(M[i-1,j-1],M[i-1,j]+P1,M[i,j-1]+P1)
          except:
              M[i,j] = inf
    return M

Now I know looping in Numpy is a terrible idea, and for things like the calculation of the initial cost matrix I've been able to find shortcuts to cutting the time down. However, as I need to evaluate potentially the entire matrix I'm not sure how else to do it. This takes around 3 seconds per call on my machine and must be applied to around 300 of these cost matrices. I'm not sure where this time comes from, as profiling says the 200,000 calls to min only take 0.1s - maybe memory access?

Is there a way to do this in parallel somehow? I assume there may be, but to me it seems each iteration is dependent unless there's a smarter way to memoize things.

There are parallels to this question: Can I avoid Python loop overhead on dynamic programming with numpy?

I'm happy to switch to C if necessary, but I like the flexibility of Python for rapid testing and the lack of faff with file IO. Off the top of my head, is something like the following code likely to be significantly faster?

#define P1 10
void optimalcost(double** costin, double** costout){
    /* 
        We assume that costout is initially
        filled with costin's values.
    */
    float a,b,c,prevcost;

    for(i=0;i<400;i++){
        for(j=0;j<400;j++){
            a = prevcost+P1;
            b = costout[i][j-1]+P1;
            c = costout[i-1][j-1];
            costout[i][j] += min(prevcost,min(b,c));
            prevcost = costout[i][j];
        }
    }
}

return;

Update:

I'm on Mac, and I don't want to install a whole new Python toolchain so I used Homebrew .

> brew install llvm --rtti
> LLVM_CONFIG_PATH=/usr/local/opt/llvm/bin/llvm-config pip install llvmpy
> pip install numba

New "numba'd" code:

from numba import autojit, jit
import time
import numpy as np

@autojit
def cost(left, right):
    height,width = left.shape
    cost = np.zeros((height,width,width))

    for row in range(height):
        for x in range(width):
            for y in range(width):
                cost[row,x,y] = abs(left[row,x]-right[row,y])

    return cost

@autojit
def optimalcosts(initcost):
    costs = zeros_like(initcost)
    for row in range(height):
        costs[row,:,:] = optimalcost(initcost[row])
    return costs

@autojit
def optimalcost(cost):
    width1,width2 = cost.shape
    P1=10
    prevcost = 0.0
    M = np.array(cost)
    for i in range(1,width1):
        for j in range(1,width2):
            M[i,j] += min(M[i-1,j-1],prevcost+P1,M[i,j-1]+P1)
            prevcost = M[i,j]
    return M

prob_size = 400
left = np.random.rand(prob_size,prob_size)
right = np.random.rand(prob_size,prob_size)

print '---------- Numba Time ----------'
t =  time.time()
c = cost(left,right)
optimalcost(c[100])
print time.time()-t

print '---------- Native python Time --'
t =  time.time()
c = cost.py_func(left,right)
optimalcost.py_func(c[100])
print time.time()-t

It's interesting writing code in Python that is so un-Pythonic. Note for anyone interested in writing Numba code, you need to explicitly express loops in your code. Before, I had the neat Numpy one-liner,

abs(left[row,:][:,newaxis] - right[row,:])

to calculate the cost. That took around 7 seconds with Numba. Writing out the loops properly gives 0.5s.

It's an unfair comparison to compare it to native Python code, because Numpy can do that pretty quickly, but:

Numba compiled: 0.509318113327s

Native: 172.70626092s

I'm impressed both by the numbers and how utterly simple the conversion is.

如果你不难切换到Anaconda的Python发行版,你可以尝试使用Numba ,对于这个特殊的简单动态算法,它可能会提供很多加速而不会让你离开Python。

Numpy is usually not very good at iterative jobs (though it do have some commonly used iterative functions such as np.cumsum , np.cumprod , np.linalg.* and etc). But for simple tasks like finding the shortest path (or lowest energy path) above, you can vectorize the problem by thinking about what can be computed at the same time (also try to avoid making copy:

Suppose we are finding a shortest path in the "row" direction (ie horizontally), we can first create our algorithm input:

# The problem, 300 400*400 matrices
# Create infinitely high boundary so that we dont need to handle indexing "-1"
a = np.random.rand(300, 400, 402).astype('f')
a[:,:,::a.shape[2]-1] = np.inf

then prepare some utility arrays which we will use later (creation takes constant time):

# Create self-overlapping view for 3-way minimize
# This is the input in each iteration
# The shape is (400, 300, 400, 3), separately standing for row, batch, column, left-middle-right
A = np.lib.stride_tricks.as_strided(a, (a.shape[1],len(a),a.shape[2]-2,3), (a.strides[1],a.strides[0],a.strides[2],a.strides[2]))

# Create view for output, this is basically for convenience
# The shape is (399, 300, 400). 399 comes from the fact that first row is never modified
B = a[:,1:,1:-1].swapaxes(0, 1)

# Create a temporary array in advance (try to avoid cache miss)
T = np.empty((len(a), a.shape[2]-2), 'f')

and finally do the computation and timeit:

%%timeit
for i in np.arange(a.shape[1]-1):
    A[i].min(2, T)
    B[i] += T

The timing result on my (super old laptop) machine is 1.78s, which is already way faster than 3 minute. I believe you can improve even more (while stick to numpy) by optimize the memory layout and alignment (somehow). Or, you can simply use multiprocessing.Pool . It is easy to use, and this problem is trivial to split to smaller problems (by dividing on the batch axis).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM