简体   繁体   English

计算python中numpy的欧氏距离

[英]Computing Euclidean distance for numpy in python

I am new to Python so this question might look trivia.我是 Python 的新手,所以这个问题可能看起来很琐碎。 However, I did not find a similar case to mine.但是,我没有找到与我类似的案例。 I have a matrix of coordinates for 20 nodes.我有一个 20 个节点的坐标矩阵。 I want to compute the euclidean distance between all pairs of nodes from this set and store them in a pairwise matrix.我想计算该集合中所有节点对之间的欧氏距离,并将它们存储在成对矩阵中。 For example, If I have 20 nodes, I want the end result to be a matrix of (20,20) with values of euclidean distance between each pairs of nodes.例如,如果我有 20 个节点,我希望最终结果是一个矩阵 (20,20),每对节点之间的欧几里得距离值。 I tried to used a for loop to go through each element of the coordinate set and compute euclidean distance as follows:我尝试通过坐标集的每个元素使用 for 循环到 go 并计算欧氏距离,如下所示:

ncoord=numpy.matrix('3225   318;2387    989;1228    2335;57      1569;2288  8138;3514   2350;7936   314;9888    4683;6901   1834;7515   8231;709   3701;1321    8881;2290   2350;5687   5034;760    9868;2378   7521;9025   5385;4819   5943;2917   9418;3928   9770')
n=20 
c=numpy.zeros((n,n))
for i in range(0,n):
    for j in range(i+1,n):
        c[i][j]=math.sqrt((ncoord[i][0]-ncoord[j][0])**2+(ncoord[i][1]-ncoord[j][1])**2)

How ever, I am getting an error of "input must be a square array ".然而,我收到“输入必须是方形数组”的错误。 I wonder if anybody knows what is happening here.我想知道是否有人知道这里发生了什么。 Thanks谢谢

There are much, much faster alternatives to using nested for loops for this.为此,有很多更快的替代方法可以使用嵌套的for循环。 I'll show you two different approaches - the first will be a more general method that will introduce you to broadcasting and vectorization, and the second uses a more convenient scipy library function.我将向您展示两种不同的方法 - 第一种是更通用的方法,将向您介绍广播和矢量化,第二种使用更方便的 scipy 库函数。


1. The general way, using broadcasting & vectorization 1.一般方式,使用广播&矢量化

One of the first things I'd suggest doing is switching to using np.array rather than np.matrix .我建议做的第一件事是切换到使用np.array而不是np.matrix Arrays are preferred for a number of reasons , most importantly because they can have >2 dimensions, and they make element-wise multiplication much less awkward.数组是首选的原因有很多,最重要的是因为它们可以有 > 2 维,并且它们使逐元素乘法不那么尴尬。

import numpy as np

ncoord = np.array(ncoord)

With an array, we can eliminate the nested for loops by inserting a new singleton dimension and broadcasting the subtraction over it:使用数组,我们可以通过插入一个新的单一维度并 广播减法来消除嵌套的for循环:

# indexing with None (or np.newaxis) inserts a new dimension of size 1
print(ncoord[:, :, None].shape)
# (20, 2, 1)

# by making the 'inner' dimensions equal to 1, i.e. (20, 2, 1) - (1, 2, 20),
# the subtraction is 'broadcast' over every pair of rows in ncoord
xydiff = ncoord[:, :, None] - ncoord[:, :, None].T

print(xydiff.shape)
# (20, 2, 20)

This is equivalent to looping over every pair of rows using nested for loops, but much, much faster!这相当于使用嵌套 for 循环遍历每对行,但速度要快得多!

xydiff2 = np.zeros((20, 2, 20), dtype=xydiff.dtype)
for ii in range(20):
    for jj in range(20):
        for kk in range(2):
            xydiff[ii, kk, jj] = ncoords[ii, kk] - ncoords[jj, kk]

# check that these give the same result
print(np.all(xydiff == xydiff2))
# True

The rest we can also do using vectorized operations:剩下的我们也可以使用向量化操作来完成:

# we square the differences and sum over the 'middle' axis, equivalent to
# computing (x_i - x_j) ** 2 + (y_i - y_j) ** 2
ssdiff = (xydiff * xydiff).sum(1)

# finally we take the square root
D = np.sqrt(ssdiff)

The whole thing could be done in one line like this:整个事情可以像这样在一行中完成:

D = np.sqrt(((ncoord[:, :, None] - ncoord[:, :, None].T) ** 2).sum(1))

2. The lazy way, using pdist 2.懒人方式,使用pdist

It turns out that there's already a fast and convenient function for computing all pairwise distances: scipy.spatial.distance.pdist .事实证明,已经有一个快速方便的函数来计算所有成对距离: scipy.spatial.distance.pdist

from scipy.spatial.distance import pdist, squareform

d = pdist(ncoord)

# pdist just returns the upper triangle of the pairwise distance matrix. to get
# the whole (20, 20) array we can use squareform:

print(d.shape)
# (190,)

D2 = squareform(d)
print(D2.shape)
# (20, 20)

# check that the two methods are equivalent
print np.all(D == D2)
# True
for i in range(0, n):
    for j in range(i+1, n):
        c[i, j] = math.sqrt((ncoord[i, 0] - ncoord[j, 0])**2 
        + (ncoord[i, 1] - ncoord[j, 1])**2)

Note : ncoord[i, j] is not the same as ncoord[i][j] for a Numpy matrix .注意:对于 Numpy矩阵ncoord[i, j]ncoord[i][j] This appears to be the source of confusion.这似乎是混淆的根源。 If ncoord is a Numpy array then they will give the same result.如果ncoord是一个 Numpy数组,那么它们将给出相同的结果。

For a Numpy matrix , ncoord[i] returns the ith row of ncoord , which itself is a Numpy matrix object with shape 1 x 2 in your case.对于numpy的矩阵ncoord[i]返回的第i行ncoord ,它本身是与你的情况形状1×2矩阵numpy的对象。 Therefore, ncoord[i][j] actually means: take the ith row of ncoord and take the jth row of that 1 x 2 matrix .因此, ncoord[i][j]实际上意味着:取ncoord第 i 行取该 1 x 2矩阵第 j 行 This is where your indexing problems comes about when j > 0.这就是当j > 0 时出现索引问题的地方。

Regarding your comments on assigning to c[i][j] "working", it shouldn't.关于您对分配给c[i][j] “工作”的评论,它不应该。 At least on my build of Numpy 1.9.1 it shouldn't work if your indices i and j iterates up to n .至少在我构建的 Numpy 1.9.1 中,如果您的索引ij迭代到n则它不应该工作。

As an aside, remember to add the transpose of the matrix c to itself.顺便说一句,请记住将矩阵c的转置添加到自身。

It is recommended to use Numpy arrays instead of matrix.建议使用 Numpy 数组而不是矩阵。 See this post .看到这个帖子

If your coordinates are stored as a Numpy array, then pairwise distance can be computed as:如果您的坐标存储为 Numpy 数组,则成对距离可以计算为:

from scipy.spatial.distance import pdist

pairwise_distances = pdist(ncoord, metric="euclidean", p=2)

or simply或者干脆

pairwise_distances = pdist(ncoord)

since the default metric is "euclidean", and default "p" is 2.因为默认度量是“欧几里得”,默认“p”是 2。

In a comment below I mistakenly mentioned that the result of pdist is anxn matrix.在下面的评论中,我错误地提到 pdist 的结果是 anxn 矩阵。 To get anxn matrix, you will need to do the following:要获得 anxn 矩阵,您需要执行以下操作:

from scipy.spatial.distance import pdist, squareform

pairwise_distances = squareform(pdist(ncoord))

or或者

from scipy.spatial.distance import cdist

pairwise_distances = cdist(ncoord, ncoord)

What I figure you wanted to do: You said you wanted a 20 by 20 matrix... but the one you coded is triangular.我想你想要做什么:你说你想要一个 20 x 20 的矩阵......但你编码的矩阵是三角形的。

Thus I coded a complete 20x20 matrix instead.因此,我编码了一个完整的 20x20 矩阵。

distances = []
for i in range(len(ncoord)):
    given_i = []
    for j in range(len(ncoord)):
        d_val = math.sqrt((ncoord[i, 0]-ncoord[j,0])**2+(ncoord[i,1]-ncoord[j,1])**2)
        given_i.append(d_val)

    distances.append(given_i)

    # distances[i][j] = distance from i to j

SciPy way: SciPy方式:

from scipy.spatial.distance import cdist
# Isn't scipy nice - can also use pdist... works in the same way but different recall method.
distances = cdist(ncoord, ncoord, 'euclidean')

Using your own custom sqrt sum sqaures is not always safe, they can overflow or underflow.使用您自己的自定义 sqrt sum sqaures 并不总是安全的,它们可能会溢出或下溢。 Speed wise they are same速度方面他们是一样的

np.hypot(
    np.subtract.outer(x, x),
    np.subtract.outer(y, y)
)

Underflow下溢

i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0

Overflow溢出

i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf

No Underflow无下溢

i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200

No Overflow无溢出

i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200

Refer参考

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM