![](/img/trans.png)
[英]How to calculate the euclidean distance between two matrices using only matrix operations in numpy python (no for loops)?
[英]How to efficiently compute euclidean distance matrices without for loops in python?
我有一个 (51266,20,25,3) (N,F,J,C) 矩阵,其中 N 是示例编号,F 是帧数,J 是关节,C 是关节的 xyz 坐标. 我想计算每个示例中每个帧的欧几里得距离矩阵,以获得一个维度矩阵 (51266,20,25,25) 我的代码是
from sklearn.metrics.pairwise import euclidean_distances as euc
from tqdm import tqdm
import numpy as np
Examples = np.load('allExamples.npy')
theEuclideanMethod = np.zeros((0,20,25,25))
for example in tqdm(range(Examples.shape[0])):
euclideanBox = np.zeros((0,25,25))
for frame in range(20):
euclideanBox = np.concatenate((euclideanBox,euc(Examples[example,frame,:,:])[np.newaxis,...]),axis=0)
euclideanBox = euclideanBox[np.newaxis,...]
theEuclideanMethod = np.concatenate((theEuclideanMethod,euclideanBox))
np.save("Euclidean examples.npy",theEuclideanMethod)
print(theEuclideanMethod.shape,"Euclidean shape")
问题是我正在使用超级慢的 for 循环。 还有什么其他方法可以修改我的代码以更快地运行?
您可以使用数组广播,如下所示:
import numpy as np
examples = np.random.uniform(size=(5, 6, 7, 3))
N, F, J, C = examples.shape
# deltas.shape == (N, F, J, J, C) - Cartesian deltas
deltas = examples.reshape(N, F, J, 1, C) - examples.reshape(N, F, 1, J, C)
# distances.shape == (N, F, J, J)
distances = np.sqrt((deltas**2).sum(axis=-1), dtype=np.float32)
del deltas # release memory (only needed for interactive use)
这有点需要内存:假设您提到的 N、F、J、C 的值,中间结果( deltas
)将占用 16 GB,假设双精度。 如果您以单精度预分配 output 数组并在 N 轴上循环,则效率会更高(memory 减少 6 倍,并且缓存的使用更好):
distances = np.empty((N, F, J, J))
for i, ex in enumerate(examples):
# deltas.shape = (F, J, J, C) - Cartesian deltas
deltas = ex.reshape(F, J, 1, C) - ex.reshape(F, 1, J, C)
distances[i] = np.sqrt((deltas**2).sum(axis=-1))
这应该运行得非常快。 Float32 用于保持 memory 的使用率较低,但它是可选的。 将batch_size
调整为更大以提高速度或降低以减少 memory 的使用。
import numpy as np
# Adjust batch_size depending on your memory
batch_size = 500
# Make some fake data
x = np.random.randn(51266,20,25,3).astype(np.float32)
y = np.random.randn(51266,20,25,3).astype(np.float32)
# distance_matrix
d = np.empty(x.shape[:-1] + (x.shape[-2],), dtype=np.float32)
# Number of batches
N = (x.shape[0]-1) // batch_size + 1
for i in range(N):
d[i*batch_size:(i+1)*batch_size] = np.sqrt(np.sum((
x[i*batch_size:(i+1)*batch_size,:,:,None] - \
y[i*batch_size:(i+1)*batch_size,:,None,:])**2, axis=-1))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.