Apply a function to each row of a ndarray

I have this function to calculate squared Mahalanobis distance of vector x to mean:

def mahalanobis_sqdist(x, mean, Sigma):
    Calculates squared Mahalanobis Distance of vector x 
    to distibutions' mean 
   Sigma_inv = np.linalg.inv(Sigma)
   xdiff = x - mean
   sqmdist = np.dot(np.dot(xdiff, Sigma_inv), xdiff)
   return sqmdist

I have an numpy array that has a shape of (25, 4) . So, I want to apply that function to all 25 rows of my array without a for loop. So, basically, how can I write the vectorized form of this loop:

for r in d1:
    mahalanobis_sqdist(r[0:4], mean1, Sig1)

where mean1 and Sig1 are :

>>> mean1
array([ 5.028,  3.48 ,  1.46 ,  0.248])
>>> Sig1 = np.cov(d1[0:25, 0:4].T)
>>> Sig1
array([[ 0.16043333,  0.11808333,  0.02408333,  0.01943333],
       [ 0.11808333,  0.13583333,  0.00625   ,  0.02225   ],
       [ 0.02408333,  0.00625   ,  0.03916667,  0.00658333],
       [ 0.01943333,  0.02225   ,  0.00658333,  0.01093333]])

I have tried the following but it didn't work:

>>> vecdist = np.vectorize(mahalanobis_sqdist)
>>> vecdist(d1, mean1, Sig1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1862, in __call__
    theout = self.thefunc(*newargs)
  File "<stdin>", line 6, in mahalanobis_sqdist
  File "/usr/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 445, in inv
    return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
IndexError: tuple index out of range

To apply a function to each row of an array, you could use:

np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, Sig1)    

In this case, however, there is a better way. You don't have to apply a function to each row. Instead, you can apply NumPy operations to the entire d1 array to calculate the same result. np.einsum can replace the for-loop and the two calls to np.dot :

def mahalanobis_sqdist2(d, mean, Sigma):
   Sigma_inv = np.linalg.inv(Sigma)
   xdiff = d - mean
   return np.einsum('ij,im,mj->i', xdiff, xdiff, Sigma_inv)

Here are some benchmarks:

import numpy as np

def mahalanobis_sqdist(x, mean, Sigma):
   Calculates squared Mahalanobis Distance of vector x 
   to distibutions mean 
   Sigma_inv = np.linalg.inv(Sigma)
   xdiff = x - mean
   sqmdist = np.dot(np.dot(xdiff, Sigma_inv), xdiff)
   return sqmdist

def mahalanobis_sqdist2(d, mean, Sigma):
   Sigma_inv = np.linalg.inv(Sigma)
   xdiff = d - mean
   return np.einsum('ij,im,mj->i', xdiff, xdiff, Sigma_inv)

def using_loop(d1, mean, Sigma):
    expected = []
    for r in d1:
        expected.append(mahalanobis_sqdist(r[0:4], mean1, Sig1))
    return np.array(expected)

d1 = np.random.random((25,4))
mean1 = np.array([ 5.028,  3.48 ,  1.46 ,  0.248])
Sig1 = np.cov(d1[0:25, 0:4].T)

expected = using_loop(d1, mean1, Sig1)
result = np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, Sig1)
result2 = mahalanobis_sqdist2(d1, mean1, Sig1)
assert np.allclose(expected, result)
assert np.allclose(expected, result2)

In [92]: %timeit mahalanobis_sqdist2(d1, mean1, Sig1)
10000 loops, best of 3: 31.1 µs per loop
In [94]: %timeit using_loop(d1, mean1, Sig1)
1000 loops, best of 3: 569 µs per loop
In [91]: %timeit np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, Sig1)
1000 loops, best of 3: 806 µs per loop

Thus mahalanobis_sqdist2 is about 18x faster than a for-loop , and 26x faster than using np.apply_along_axis .

Note that np.apply_along_axis , np.vectorize , np.frompyfunc are Python utility functions. Under the hood they use for- or while-loop s. There is no real "vectorization" going on here. They can provide syntactic assistance, but don't expect them to make your code perform any better than a for-loop you write yourself.

The answer by @unutbu works very nicely for applying any function to the rows of an array. In this particular case, there are some mathematical symmetries you can use that will speed things up considerably if you are working with large arrays.

Here is a modified version of your function:

def mahalanobis_sqdist3(x, mean, Sigma):
    Sigma_inv = np.linalg.inv(Sigma)
    xdiff = x - mean
    return (xdiff.dot(Sigma_inv)*xdiff).sum(axis=-1)

If you end up using any sort of large Sigma , I would recommend that you cache Sigma_inv and pass that in as an argument to your function instead. Since it is 4x4 in this example, this doesn't matter. I'll show how to deal with large Sigma anyway for anyone else who comes across this.

If you aren't going to be using the same Sigma repeatedly, you won't be able to cache it, so, instead of inverting the matrix, you could use a different method to solve the linear system. Here I'll use the LU decomposition built in to SciPy. This only improves the time if the number of columns of x is large relative to its number of rows.

Here is a function that shows that approach:

from scipy.linalg import lu_factor, lu_solve
def mahalanobis_sqdist4(x, mean, Sigma):
    xdiff = x - mean
    Sigma_inv = lu_factor(Sigma)
    return (xdiff.T*lu_solve(Sigma_inv, xdiff.T)).sum(axis=0)

Here are some timings. I'll include the version with einsum as mentioned in the other answer.

import numpy as np
Sig1 = np.array([[ 0.16043333,  0.11808333,  0.02408333,  0.01943333],
                 [ 0.11808333,  0.13583333,  0.00625   ,  0.02225   ],
                 [ 0.02408333,  0.00625   ,  0.03916667,  0.00658333],
                 [ 0.01943333,  0.02225   ,  0.00658333,  0.01093333]])
mean1 = np.array([ 5.028,  3.48 ,  1.46 ,  0.248])
x = np.random.rand(25, 4)
%timeit np.apply_along_axis(mahalanobis_sqdist, 1, x, mean1, Sig1)
%timeit mahalanobis_sqdist2(x, mean1, Sig1)
%timeit mahalanobis_sqdist3(x, mean1, Sig1)
%timeit mahalanobis_sqdist4(x, mean1, Sig1)


1000 loops, best of 3: 973 µs per loop
10000 loops, best of 3: 36.2 µs per loop
10000 loops, best of 3: 40.8 µs per loop
10000 loops, best of 3: 83.2 µs per loop

However, changing the sizes of the arrays involved changes the timing results. For example, letting x = np.random.rand(2500, 4) , the timings are:

10 loops, best of 3: 95 ms per loop
1000 loops, best of 3: 355 µs per loop
10000 loops, best of 3: 131 µs per loop
1000 loops, best of 3: 337 µs per loop

And letting x = np.random.rand(1000, 1000) , Sigma1 = np.random.rand(1000, 1000) , and mean1 = np.random.rand(1000) , the timings are:

1 loops, best of 3: 1min 24s per loop
1 loops, best of 3: 2.39 s per loop
10 loops, best of 3: 155 ms per loop
10 loops, best of 3: 99.9 ms per loop

Edit : I noticed that one of the other answers used the Cholesky decomposition. Given that Sigma is symmetric and positive definite, we can actually do better than my above results. There are some good routines from BLAS and LAPACK available through SciPy that can work with symmetric positive-definite matrices. Here are two faster versions.

from scipy.linalg.fblas import dsymm
def mahalanobis_sqdist5(x, mean, Sigma_inv):
    xdiff = x - mean
    Sigma_inv = la.inv(Sigma)
    return np.einsum('...i,...i->...',dsymm(1., Sigma_inv, xdiff.T).T, xdiff)
from scipy.linalg.flapack import dposv
def mahalanobis_sqdist6(x, mean, Sigma):
    xdiff = x - mean
    return np.einsum('...i,...i->...', xdiff, dposv(Sigma, xdiff.T)[1].T)

The first one still inverts Sigma. If you pre-compute the inverse and reuse it, it is much faster (the 1000x1000 case takes 35.6ms on my machine with the pre-computed inverse). I also used einsum to take the product then sum along the last axis. This ended up being marginally faster than doing something like (A * B).sum(axis=-1) . These two functions give the following timings:

First test case:

10000 loops, best of 3: 55.3 µs per loop
100000 loops, best of 3: 14.2 µs per loop

Second test case:

10000 loops, best of 3: 121 µs per loop
10000 loops, best of 3: 79 µs per loop

Third test case:

10 loops, best of 3: 92.5 ms per loop
10 loops, best of 3: 48.2 ms per loop

Just saw a really nice comment on reddit that might speed things up even a little more:

This is not surprising to anyone who uses numpy regularly. For loops in python are horribly slow. Actually, einsum is pretty slow too. Here's a version that is faster if you have lots of vectors (500 vectors in 4 dimensions is enough to make this version faster than einsum on my machine):

def no_einsum(d, mean, Sigma):
    L_inv = np.linalg.inv(numpy.linalg.cholesky(Sigma))
    xdiff = d - mean
    return np.sum(np.dot(xdiff, L_inv.T)**2, axis=1)

If your points are also high dimensional then computing the inverse is slow (and generally a bad idea anyway) and you can save time by solving the system directly (500 vectors in 250 dimensions is enough to make this version the fastest on my machine):

def no_einsum_solve(d, mean, Sigma):
    L = numpy.linalg.cholesky(Sigma)
    xdiff = d - mean
    return np.sum(np.linalg.solve(L, xdiff.T)**2, axis=0)

The problem is that np.vectorize vectorizes over all arguments, but you need to vectorize only over the first one. You need to use excluded keyword argument to vectorize :

np.vectorize(mahalanobis_sqdist, excluded=[1, 2])

