简体   繁体   English

如果只对最近点感兴趣,则优化欧几里得距离矩阵算法

[英]Optimise Euclidean distance matrix algorithm if only interested in closest points

The following Euclidean distance algorithm creates a MxM matrix of distances between the rows of an MxN input matrix (representative of points in some N dimensional space).以下欧几里德距离算法创建 MxM 输入矩阵的行之间的距离矩阵(代表某个 N 维空间中的点)。 The speed of this algorithm scales in O(m^2).该算法的速度以 O(m^2) 为单位。 Can this be improved upon if only interested in the rows (ie points) that are closest to each other?如果只对彼此最接近的行(即点)感兴趣,这可以改进吗? (My downstream task constists of performing K-NN, amongst other things) (我的下游任务包括执行 K-NN 等)

import numpy as np


vectors = np.random.randn(100, 20)
m = vectors.shape[0]

distances = np.zeros([m, m])
for i in range(m):
    vec = vectors[i]
    distances[i] = [np.linalg.norm(vec - vectors[j]) for j in range(m)]

I would suggest leveraging scipy 's condensed distance matrix instead of the for-loop of pairwise comparisons.我建议利用scipy的压缩距离矩阵,而不是成对比较的 for 循环。 In particular,尤其是,

from scipy.spatial.distance import pdist, squareform
distances = squareform(pdist(vectors))

provides a ~85x speedup!提供约 85 倍的加速! The documentation can be found on here .可以在此处找到文档。

Fundamentally, the complexity seems to remain quadratic (as you need to compare every element of vectors with one another).从根本上说,复杂性似乎仍然是二次的(因为您需要将vectors的每个元素相互比较)。 However, the implementation leverages symmetry and the fact that the distance of every element to itself is 0 , thereby only computing the upper triangular sub-matrix and then mirroring it along the diagonal to obtain the quadratic distance matrix.但是,该实现利用了对称性以及每个元素与自身的距离为0的事实,因此仅计算上三角子矩阵,然后将其沿对角线镜像以获得二次距离矩阵。

Your code ran in 71ms while SciPy ran in 0.83ms.您的代码运行时间为 71 毫秒,而 SciPy 运行时间为 0.83 毫秒。 A detailed performance comparison can be found in this thread .可以在此线程中找到详细的性能比较。

Regardless, if you try to run kNN you might want to consider scikit-learn where you can simply provide the vectors as X as shown on here .无论如何,如果您尝试运行 kNN,您可能需要考虑scikit-learn ,您可以在其中简单地提供vectors作为X ,如此处所示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM