简体   繁体   中英

optimize this numpy operation

I have inherited some code and there is one particular operation that takes an inordinate amount of time.

The operation is defined as:

cutoff = 0.2
# X has shape (76187, 247, 20)
X_flat = X.reshape((X.shape[0], X.shape[1] * X.shape[2]))
weightfun = lambda x: 1.0 / np.sum(np.dot(X_flat, x) / np.dot(x, x) > 1 - cutoff)
# This is expensive...
N_list = np.array(list(map(weightfun, X_flat)))

This takes hours to compute on my machine. I am wondering if there is a way to optimize this. The code is computing normalized hamming distances between vector sequences.

weightfun performs two dot product operations for every row of X_flat . The worst one is np.dot(X_flat, x) , where the dot product is performed against the whole X_flat matrix. But there's a trick to speed things up. The iterative part in the first dot product can be computed only once with:

X_matmut = X_flat @ X_flat.T

Also, I noticed that the second dot product is nothing more than the diagonal of the result of the first one.

The rewritten code looks like this:

cutoff = 0.2
# X has shape (76187, 247, 20)
X_flat = X.reshape((X.shape[0], X.shape[1] * X.shape[2]))

X1 = X_flat @ X_flat.T
X2 = X1.diagonal()

N_list = 1.0 / (X1/X2 > 1 - cutoff).sum(axis=0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM