用Python計算距離矩陣的更有效方法

Question

大家好，我正在嘗試編寫代碼（使用python 2），該代碼返回一個矩陣，其中包含所有行對之間的距離。 下面是我編寫的實現。 它可以按預期工作，但是隨着行數變大，它會變得非常慢。 因此，我想知道是否有人對如何使大量行的代碼更有效提出任何建議。

提前致謝

def gendist(x,alpha=2):
    (n,p) = x.shape
    len = 0
    for ii in range(1,n):
        len = len + ii
    d = np.empty((len,p))
    ind = 0
    for ii in range(0,n):
        for jj in range(1,n):
            if ii < jj:
                d[ind,] = (x[ii,]-x[jj,])**alpha
                ind = ind + 1
    return d

Answer 1

我看到您使用X.shape ，對我來說，可以假定您正在使用NumPy

碼：

#!/usr/bin/env python3
import numpy as np
import scipy.spatial.distance as dist

a = np.random.randint(0, 10, (5, 3))
b = dist.pdist(a)
print('Matrix:')
print(a)
print('Pdist')
for d in b:
    print(d)

輸出：

Matrix:
[[4 7 6]
 [8 2 8]
 [8 3 5]
 [2 4 7]
 [0 7 5]]
Pdist
6.7082039325
5.74456264654
3.74165738677
4.12310562562
3.16227766017
6.40312423743
9.89949493661
6.40312423743
8.94427191
4.12310562562

其中組合的順序為（0,1），（0,2），（0,3），（0,4），（1,2），（1,3），（1,4），（2 ，3），（2,4），...

默認度量標准是歐氏距離。 請參閱pdist以應用其他指標。

Answer 2

如果沒有scipy（例如，如果安裝了Abaqus，則可能會在沒有scipy的情況下獲得numpy），這會有些困難。

def gendist(x,alpha=2):
    xCopies=x.repeat(x.shape[0],axis=0).reshape(np.conatenate(([a.shape[0]],a.shape))
    #n x n x p matrix filled with copies of x
    xVecs=xCopies-xCopies.swapaxes(0,1) #matrix of distance vectors
    xDists=np.sum(xVecs**alpha,axis=-1)**(1/alpha) #n x n matrix of distances
    Return xDists

那應該很健壯，至少這是我必須使用的。

Answer 3

我想你要找的是什么sklearn pairwise_distances 。 scipy distance_matrix在我的計算機上花費了約115秒的時間來計算512維向量上的10Kx10K距離矩陣。 scipy cdist大約需要50秒。 sklearn pairwise_distances大約需要9秒。 從文檔中：

請注意，對於“ cityblock”，“ cosine”和“ euclidean”（它們是有效的scipy.spatial.distance指標），將使用scikit-learn實現，它實現得更快，並且支持稀疏矩陣（除了'城市街區'）。

用Python計算距離矩陣的更有效方法

問題描述

3 個解決方案

解決方案1
0 2016-09-22 08:29:48

解決方案2
0 2016-09-22 09:03:57

解決方案3
0 2019-02-14 18:27:27

用Python計算距離矩陣的更有效方法

問題描述

3 個解決方案

解決方案1 0 2016-09-22 08:29:48

解決方案2 0 2016-09-22 09:03:57

解決方案3 0 2019-02-14 18:27:27

解決方案1
0 2016-09-22 08:29:48

解決方案2
0 2016-09-22 09:03:57

解決方案3
0 2019-02-14 18:27:27