计算python中numpy的欧氏距离

Question

I am new to Python so this question might look trivia.我是 Python 的新手，所以这个问题可能看起来很琐碎。 However, I did not find a similar case to mine.但是，我没有找到与我类似的案例。 I have a matrix of coordinates for 20 nodes.我有一个 20 个节点的坐标矩阵。 I want to compute the euclidean distance between all pairs of nodes from this set and store them in a pairwise matrix.我想计算该集合中所有节点对之间的欧氏距离，并将它们存储在成对矩阵中。 For example, If I have 20 nodes, I want the end result to be a matrix of (20,20) with values of euclidean distance between each pairs of nodes.例如，如果我有 20 个节点，我希望最终结果是一个矩阵 (20,20)，每对节点之间的欧几里得距离值。 I tried to used a for loop to go through each element of the coordinate set and compute euclidean distance as follows:我尝试通过坐标集的每个元素使用 for 循环到 go 并计算欧氏距离，如下所示：

ncoord=numpy.matrix('3225   318;2387    989;1228    2335;57      1569;2288  8138;3514   2350;7936   314;9888    4683;6901   1834;7515   8231;709   3701;1321    8881;2290   2350;5687   5034;760    9868;2378   7521;9025   5385;4819   5943;2917   9418;3928   9770')
n=20 
c=numpy.zeros((n,n))
for i in range(0,n):
    for j in range(i+1,n):
        c[i][j]=math.sqrt((ncoord[i][0]-ncoord[j][0])**2+(ncoord[i][1]-ncoord[j][1])**2)

How ever, I am getting an error of "input must be a square array ".然而，我收到“输入必须是方形数组”的错误。 I wonder if anybody knows what is happening here.我想知道是否有人知道这里发生了什么。 Thanks谢谢

Answer 1

There are much, much faster alternatives to using nested for loops for this.为此，有很多更快的替代方法可以使用嵌套的for循环。 I'll show you two different approaches - the first will be a more general method that will introduce you to broadcasting and vectorization, and the second uses a more convenient scipy library function.我将向您展示两种不同的方法 - 第一种是更通用的方法，将向您介绍广播和矢量化，第二种使用更方便的 scipy 库函数。

1. The general way, using broadcasting & vectorization 1.一般方式，使用广播&矢量化

One of the first things I'd suggest doing is switching to using np.array rather than np.matrix .我建议做的第一件事是切换到使用np.array而不是np.matrix 。 Arrays are preferred for a number of reasons , most importantly because they can have >2 dimensions, and they make element-wise multiplication much less awkward.数组是首选的原因有很多，最重要的是因为它们可以有 > 2 维，并且它们使逐元素乘法不那么尴尬。

import numpy as np

ncoord = np.array(ncoord)

With an array, we can eliminate the nested for loops by inserting a new singleton dimension and broadcasting the subtraction over it:使用数组，我们可以通过插入一个新的单一维度并广播减法来消除嵌套的for循环：

# indexing with None (or np.newaxis) inserts a new dimension of size 1
print(ncoord[:, :, None].shape)
# (20, 2, 1)

# by making the 'inner' dimensions equal to 1, i.e. (20, 2, 1) - (1, 2, 20),
# the subtraction is 'broadcast' over every pair of rows in ncoord
xydiff = ncoord[:, :, None] - ncoord[:, :, None].T

print(xydiff.shape)
# (20, 2, 20)

This is equivalent to looping over every pair of rows using nested for loops, but much, much faster!这相当于使用嵌套 for 循环遍历每对行，但速度要快得多！

xydiff2 = np.zeros((20, 2, 20), dtype=xydiff.dtype)
for ii in range(20):
    for jj in range(20):
        for kk in range(2):
            xydiff[ii, kk, jj] = ncoords[ii, kk] - ncoords[jj, kk]

# check that these give the same result
print(np.all(xydiff == xydiff2))
# True

The rest we can also do using vectorized operations:剩下的我们也可以使用向量化操作来完成：

# we square the differences and sum over the 'middle' axis, equivalent to
# computing (x_i - x_j) ** 2 + (y_i - y_j) ** 2
ssdiff = (xydiff * xydiff).sum(1)

# finally we take the square root
D = np.sqrt(ssdiff)

The whole thing could be done in one line like this:整个事情可以像这样在一行中完成：

D = np.sqrt(((ncoord[:, :, None] - ncoord[:, :, None].T) ** 2).sum(1))

2. The lazy way, using `pdist` 2.懒人方式，使用`pdist`

It turns out that there's already a fast and convenient function for computing all pairwise distances: scipy.spatial.distance.pdist .事实证明，已经有一个快速方便的函数来计算所有成对距离： scipy.spatial.distance.pdist 。

from scipy.spatial.distance import pdist, squareform

d = pdist(ncoord)

# pdist just returns the upper triangle of the pairwise distance matrix. to get
# the whole (20, 20) array we can use squareform:

print(d.shape)
# (190,)

D2 = squareform(d)
print(D2.shape)
# (20, 20)

# check that the two methods are equivalent
print np.all(D == D2)
# True

Answer 2

for i in range(0, n):
    for j in range(i+1, n):
        c[i, j] = math.sqrt((ncoord[i, 0] - ncoord[j, 0])**2 
        + (ncoord[i, 1] - ncoord[j, 1])**2)

Note : ncoord[i, j] is not the same as ncoord[i][j] for a Numpy matrix .注意：对于 Numpy矩阵ncoord[i, j]与ncoord[i][j] 。 This appears to be the source of confusion.这似乎是混淆的根源。 If ncoord is a Numpy array then they will give the same result.如果ncoord是一个 Numpy数组，那么它们将给出相同的结果。

For a Numpy matrix , ncoord[i] returns the ith row of ncoord , which itself is a Numpy matrix object with shape 1 x 2 in your case.对于numpy的矩阵， ncoord[i]返回的第i行ncoord ，它本身是与你的情况形状1×2矩阵numpy的对象。 Therefore, ncoord[i][j] actually means: take the ith row of ncoord and take the jth row of that 1 x 2 matrix .因此， ncoord[i][j]实际上意味着：取ncoord的第 i 行并取该 1 x 2矩阵的第 j 行。 This is where your indexing problems comes about when j > 0.这就是当j > 0 时出现索引问题的地方。

Regarding your comments on assigning to c[i][j] "working", it shouldn't.关于您对分配给c[i][j] “工作”的评论，它不应该。 At least on my build of Numpy 1.9.1 it shouldn't work if your indices i and j iterates up to n .至少在我构建的 Numpy 1.9.1 中，如果您的索引i和j迭代到n则它不应该工作。

As an aside, remember to add the transpose of the matrix c to itself.顺便说一句，请记住将矩阵c的转置添加到自身。

It is recommended to use Numpy arrays instead of matrix.建议使用 Numpy 数组而不是矩阵。 See this post .看到这个帖子。

If your coordinates are stored as a Numpy array, then pairwise distance can be computed as:如果您的坐标存储为 Numpy 数组，则成对距离可以计算为：

from scipy.spatial.distance import pdist

pairwise_distances = pdist(ncoord, metric="euclidean", p=2)

or simply或者干脆

pairwise_distances = pdist(ncoord)

since the default metric is "euclidean", and default "p" is 2.因为默认度量是“欧几里得”，默认“p”是 2。

In a comment below I mistakenly mentioned that the result of pdist is anxn matrix.在下面的评论中，我错误地提到 pdist 的结果是 anxn 矩阵。 To get anxn matrix, you will need to do the following:要获得 anxn 矩阵，您需要执行以下操作：

from scipy.spatial.distance import pdist, squareform

pairwise_distances = squareform(pdist(ncoord))

or或者

from scipy.spatial.distance import cdist

pairwise_distances = cdist(ncoord, ncoord)

Answer 3

What I figure you wanted to do: You said you wanted a 20 by 20 matrix... but the one you coded is triangular.我想你想要做什么：你说你想要一个 20 x 20 的矩阵......但你编码的矩阵是三角形的。

Thus I coded a complete 20x20 matrix instead.因此，我编码了一个完整的 20x20 矩阵。

distances = []
for i in range(len(ncoord)):
    given_i = []
    for j in range(len(ncoord)):
        d_val = math.sqrt((ncoord[i, 0]-ncoord[j,0])**2+(ncoord[i,1]-ncoord[j,1])**2)
        given_i.append(d_val)

    distances.append(given_i)

    # distances[i][j] = distance from i to j

SciPy way: SciPy方式：

from scipy.spatial.distance import cdist
# Isn't scipy nice - can also use pdist... works in the same way but different recall method.
distances = cdist(ncoord, ncoord, 'euclidean')

Answer 4

Using your own custom sqrt sum sqaures is not always safe, they can overflow or underflow.使用您自己的自定义 sqrt sum sqaures 并不总是安全的，它们可能会溢出或下溢。 Speed wise they are same速度方面他们是一样的

np.hypot(
    np.subtract.outer(x, x),
    np.subtract.outer(y, y)
)

Underflow下溢

i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0

Overflow溢出

i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf

No Underflow无下溢

i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200

No Overflow无溢出

i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200

Refer参考

计算python中numpy的欧氏距离

问题描述

4 个解决方案

解决方案1
27 2015-02-24 03:57:57

1. The general way, using broadcasting & vectorization 1.一般方式，使用广播&矢量化

2. The lazy way, using `pdist` 2.懒人方式，使用`pdist`

解决方案2
5 2015-02-24 03:11:23

解决方案3
1 2015-02-24 03:26:39

解决方案4
0 2021-09-18 10:38:00

Underflow下溢

Overflow溢出

No Underflow无下溢

No Overflow无溢出

计算python中numpy的欧氏距离

问题描述

4 个解决方案

解决方案1 27 2015-02-24 03:57:57

1. The general way, using broadcasting & vectorization 1.一般方式，使用广播&矢量化

2. The lazy way, using pdist 2.懒人方式，使用pdist

解决方案2 5 2015-02-24 03:11:23

解决方案3 1 2015-02-24 03:26:39

解决方案4 0 2021-09-18 10:38:00

Underflow下溢

Overflow溢出

No Underflow无下溢

No Overflow无溢出

解决方案1
27 2015-02-24 03:57:57

2. The lazy way, using `pdist` 2.懒人方式，使用`pdist`

解决方案2
5 2015-02-24 03:11:23

解决方案3
1 2015-02-24 03:26:39

解决方案4
0 2021-09-18 10:38:00