optimize this numpy operation

Question

I have inherited some code and there is one particular operation that takes an inordinate amount of time.

The operation is defined as:

cutoff = 0.2
# X has shape (76187, 247, 20)
X_flat = X.reshape((X.shape[0], X.shape[1] * X.shape[2]))
weightfun = lambda x: 1.0 / np.sum(np.dot(X_flat, x) / np.dot(x, x) > 1 - cutoff)
# This is expensive...
N_list = np.array(list(map(weightfun, X_flat)))

This takes hours to compute on my machine. I am wondering if there is a way to optimize this. The code is computing normalized hamming distances between vector sequences.

Answer 1

weightfun performs two dot product operations for every row of X_flat . The worst one is np.dot(X_flat, x) , where the dot product is performed against the whole X_flat matrix. But there's a trick to speed things up. The iterative part in the first dot product can be computed only once with:

X_matmut = X_flat @ X_flat.T

Also, I noticed that the second dot product is nothing more than the diagonal of the result of the first one.

The rewritten code looks like this:

cutoff = 0.2
# X has shape (76187, 247, 20)
X_flat = X.reshape((X.shape[0], X.shape[1] * X.shape[2]))

X1 = X_flat @ X_flat.T
X2 = X1.diagonal()

N_list = 1.0 / (X1/X2 > 1 - cutoff).sum(axis=0)

optimize this numpy operation

Question

1 answers

solution1
0 2021-11-25 16:16:03

optimize this numpy operation

Question

1 answers

solution1 0 2021-11-25 16:16:03

solution1
0 2021-11-25 16:16:03