简体   繁体   中英

Optimizing matrix writes in python/numpy

I'm currently trying to optimize a piece of code the gist of it is we go through and compute a bunch of values and write them to a matrix. The order of computation doesn't matter:

mat =  np.zeros((n, n))
mat.fill(MAX_VAL)
for i in xrange(0, smallerDim):
    for j in xrange(0,n):
        similarityVal = doACalculation(i,j, data, cache)
        mat[i][j] = abs(1.0 / (similarityVal + 1.0))

I've profiled this code and have found that approximately 90% of the time is spent on writing the value back into the matrix (the last line)

I'm wondering what the optimal way to do this type of computation to optimize the writes. Should I write to an intermediate buffer and copy in the whole row etc etc. I'm a bit clueless to performance tuning or numpy internals.

EDIT: doACalculation is not a side-effect free function. It takes in some data (assume this is some python object) and also a cache to which it writes and reads some intermediate steps. I'm not sure if it can easily be vectorized. I tried using numpy.vectorize as recommended but did not see a significant speedup over the naive for loop. (I passed in the additional data via a state variable):

Wrapping it in a numba autojit should improve the performance quite a bit.

def doACalculationVector(n, smallerDim):
    return np.ones((smallerDim, n)) + 1


def testVector():
    n = 1000
    smallerDim = 800
    mat =  np.zeros((n, n))
    mat.fill(10) 
    mat[:smallerDim] = abs(1.0 / (doACalculationVector(n, smallerDim) + 1.0))
    return mat

@numba.autojit
def doACalculationNumba(i,j):
    return 2

@numba.autojit
def testNumba():
    n = 1000
    smallerDim = 800
    mat =  np.zeros((n, n))
    mat.fill(10)
    for i in xrange(0, smallerDim):
        for j in xrange(0, n):
            mat[i,j] = abs(1.0 / (doACalculationNumba(i, j) + 1.0))
    return mat

Original Timing for reference: (with mat[i][j] changed to mat[i,j] )

In [24]: %timeit test()
1 loops, best of 3: 226 ms per loop

Now i simplified the function a bit since this was all that was provided. But testNumba was about 40 times as fast as test when timed. and about 3 times as fast as the vectorized

In [20]: %timeit testVector()
100 loops, best of 3: 17.9 ms per loop

In [21]: %timeit testNumba()
100 loops, best of 3: 5.91 ms per loop

If you can vectorize doACalculation , the task becomes easy:

similarityArray = doACalculation(np.indices((smallerDim, n)))
mat[:smallerDim] = np.abs(1.0 / (similarityArray + 1))

This should be at least an order of magnitude faster, assuming you vectorize doACalculation properly. Generally, when working with NumPy arrays, you want to avoid explicit loops and element accesses as much as possible.

For reference, an example vectorization of a possible doACalculation :

# Unvectorized
def doACalculation(i, j):
    return i**2 + i*j + j

# Vectorized
def doACalculation(input):
    i, j = input
    return i**2 + i*j + j

# Vectorized, but with the original call signature
def doACalculation(i, j):
    return i**2 + i*j + j

Yes, the last version really is supposed to be identical to the unvectorized function. It sometimes is that easy.

Even if you can't vectorize doACalculation() . You can use numpy.vectorize() to speed the calculation. Here is the test.

import numpy as np
n = 1000
smallerDim = 500

def doACalculation(i, j):
    return i+j

For loop version:

%%timeit
mat =  np.zeros((n, n))

for i in xrange(0, smallerDim):
    for j in xrange(0,n):
        similarityVal = doACalculation(i,j)
        mat[i,j] = abs(1.0 / (similarityVal + 1.0))

output:

1 loops, best of 3: 183 ms per loop

vectorize() version:

%%timeit
mat2 =  np.zeros((n, n))
i, j = np.ix_(np.arange(smallerDim), np.arange(n))
f = np.vectorize(doACalculation, "d")
mat2[:smallerDim] = np.abs(1.0/(f(i, j) + 1))

output:

10 loops, best of 3: 97.3 ms per loop

Test result:

np.allclose(mat,mat2)

outpout:

True

This method doesn't make doACalculation() calling much faster, but it make it possible that subsequent calculation can be done vectorized.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM