[英]Computing Euclidean distance for numpy in python
I am new to Python so this question might look trivia.我是 Python 的新手,所以这个问题可能看起来很琐碎。 However, I did not find a similar case to mine.
但是,我没有找到与我类似的案例。 I have a matrix of coordinates for 20 nodes.
我有一个 20 个节点的坐标矩阵。 I want to compute the euclidean distance between all pairs of nodes from this set and store them in a pairwise matrix.
我想计算该集合中所有节点对之间的欧氏距离,并将它们存储在成对矩阵中。 For example, If I have 20 nodes, I want the end result to be a matrix of (20,20) with values of euclidean distance between each pairs of nodes.
例如,如果我有 20 个节点,我希望最终结果是一个矩阵 (20,20),每对节点之间的欧几里得距离值。 I tried to used a for loop to go through each element of the coordinate set and compute euclidean distance as follows:
我尝试通过坐标集的每个元素使用 for 循环到 go 并计算欧氏距离,如下所示:
ncoord=numpy.matrix('3225 318;2387 989;1228 2335;57 1569;2288 8138;3514 2350;7936 314;9888 4683;6901 1834;7515 8231;709 3701;1321 8881;2290 2350;5687 5034;760 9868;2378 7521;9025 5385;4819 5943;2917 9418;3928 9770')
n=20
c=numpy.zeros((n,n))
for i in range(0,n):
for j in range(i+1,n):
c[i][j]=math.sqrt((ncoord[i][0]-ncoord[j][0])**2+(ncoord[i][1]-ncoord[j][1])**2)
How ever, I am getting an error of "input must be a square array ".然而,我收到“输入必须是方形数组”的错误。 I wonder if anybody knows what is happening here.
我想知道是否有人知道这里发生了什么。 Thanks
谢谢
There are much, much faster alternatives to using nested for
loops for this.为此,有很多更快的替代方法可以使用嵌套的
for
循环。 I'll show you two different approaches - the first will be a more general method that will introduce you to broadcasting and vectorization, and the second uses a more convenient scipy library function.我将向您展示两种不同的方法 - 第一种是更通用的方法,将向您介绍广播和矢量化,第二种使用更方便的 scipy 库函数。
One of the first things I'd suggest doing is switching to using np.array
rather than np.matrix
.我建议做的第一件事是切换到使用
np.array
而不是np.matrix
。 Arrays are preferred for a number of reasons , most importantly because they can have >2 dimensions, and they make element-wise multiplication much less awkward.数组是首选的原因有很多,最重要的是因为它们可以有 > 2 维,并且它们使逐元素乘法不那么尴尬。
import numpy as np
ncoord = np.array(ncoord)
With an array, we can eliminate the nested for
loops by inserting a new singleton dimension and broadcasting the subtraction over it:使用数组,我们可以通过插入一个新的单一维度并 广播减法来消除嵌套的
for
循环:
# indexing with None (or np.newaxis) inserts a new dimension of size 1
print(ncoord[:, :, None].shape)
# (20, 2, 1)
# by making the 'inner' dimensions equal to 1, i.e. (20, 2, 1) - (1, 2, 20),
# the subtraction is 'broadcast' over every pair of rows in ncoord
xydiff = ncoord[:, :, None] - ncoord[:, :, None].T
print(xydiff.shape)
# (20, 2, 20)
This is equivalent to looping over every pair of rows using nested for loops, but much, much faster!这相当于使用嵌套 for 循环遍历每对行,但速度要快得多!
xydiff2 = np.zeros((20, 2, 20), dtype=xydiff.dtype)
for ii in range(20):
for jj in range(20):
for kk in range(2):
xydiff[ii, kk, jj] = ncoords[ii, kk] - ncoords[jj, kk]
# check that these give the same result
print(np.all(xydiff == xydiff2))
# True
The rest we can also do using vectorized operations:剩下的我们也可以使用向量化操作来完成:
# we square the differences and sum over the 'middle' axis, equivalent to
# computing (x_i - x_j) ** 2 + (y_i - y_j) ** 2
ssdiff = (xydiff * xydiff).sum(1)
# finally we take the square root
D = np.sqrt(ssdiff)
The whole thing could be done in one line like this:整个事情可以像这样在一行中完成:
D = np.sqrt(((ncoord[:, :, None] - ncoord[:, :, None].T) ** 2).sum(1))
pdist
pdist
It turns out that there's already a fast and convenient function for computing all pairwise distances: scipy.spatial.distance.pdist
.事实证明,已经有一个快速方便的函数来计算所有成对距离:
scipy.spatial.distance.pdist
。
from scipy.spatial.distance import pdist, squareform
d = pdist(ncoord)
# pdist just returns the upper triangle of the pairwise distance matrix. to get
# the whole (20, 20) array we can use squareform:
print(d.shape)
# (190,)
D2 = squareform(d)
print(D2.shape)
# (20, 20)
# check that the two methods are equivalent
print np.all(D == D2)
# True
for i in range(0, n):
for j in range(i+1, n):
c[i, j] = math.sqrt((ncoord[i, 0] - ncoord[j, 0])**2
+ (ncoord[i, 1] - ncoord[j, 1])**2)
Note : ncoord[i, j]
is not the same as ncoord[i][j]
for a Numpy matrix .注意:对于 Numpy矩阵
ncoord[i, j]
与ncoord[i][j]
。 This appears to be the source of confusion.这似乎是混淆的根源。 If
ncoord
is a Numpy array then they will give the same result.如果
ncoord
是一个 Numpy数组,那么它们将给出相同的结果。
For a Numpy matrix , ncoord[i]
returns the ith row of ncoord
, which itself is a Numpy matrix object with shape 1 x 2 in your case.对于numpy的矩阵,
ncoord[i]
返回的第i行ncoord
,它本身是与你的情况形状1×2矩阵numpy的对象。 Therefore, ncoord[i][j]
actually means: take the ith row of ncoord
and take the jth row of that 1 x 2 matrix .因此,
ncoord[i][j]
实际上意味着:取ncoord
的第 i 行并取该 1 x 2矩阵的第 j 行。 This is where your indexing problems comes about when j
> 0.这就是当
j
> 0 时出现索引问题的地方。
Regarding your comments on assigning to c[i][j]
"working", it shouldn't.关于您对分配给
c[i][j]
“工作”的评论,它不应该。 At least on my build of Numpy 1.9.1 it shouldn't work if your indices i
and j
iterates up to n
.至少在我构建的 Numpy 1.9.1 中,如果您的索引
i
和j
迭代到n
则它不应该工作。
As an aside, remember to add the transpose of the matrix c
to itself.顺便说一句,请记住将矩阵
c
的转置添加到自身。
It is recommended to use Numpy arrays instead of matrix.建议使用 Numpy 数组而不是矩阵。 See this post .
看到这个帖子。
If your coordinates are stored as a Numpy array, then pairwise distance can be computed as:如果您的坐标存储为 Numpy 数组,则成对距离可以计算为:
from scipy.spatial.distance import pdist
pairwise_distances = pdist(ncoord, metric="euclidean", p=2)
or simply或者干脆
pairwise_distances = pdist(ncoord)
since the default metric is "euclidean", and default "p" is 2.因为默认度量是“欧几里得”,默认“p”是 2。
In a comment below I mistakenly mentioned that the result of pdist is anxn matrix.在下面的评论中,我错误地提到 pdist 的结果是 anxn 矩阵。 To get anxn matrix, you will need to do the following:
要获得 anxn 矩阵,您需要执行以下操作:
from scipy.spatial.distance import pdist, squareform
pairwise_distances = squareform(pdist(ncoord))
or或者
from scipy.spatial.distance import cdist
pairwise_distances = cdist(ncoord, ncoord)
What I figure you wanted to do: You said you wanted a 20 by 20 matrix... but the one you coded is triangular.我想你想要做什么:你说你想要一个 20 x 20 的矩阵......但你编码的矩阵是三角形的。
Thus I coded a complete 20x20 matrix instead.因此,我编码了一个完整的 20x20 矩阵。
distances = []
for i in range(len(ncoord)):
given_i = []
for j in range(len(ncoord)):
d_val = math.sqrt((ncoord[i, 0]-ncoord[j,0])**2+(ncoord[i,1]-ncoord[j,1])**2)
given_i.append(d_val)
distances.append(given_i)
# distances[i][j] = distance from i to j
SciPy way: SciPy方式:
from scipy.spatial.distance import cdist
# Isn't scipy nice - can also use pdist... works in the same way but different recall method.
distances = cdist(ncoord, ncoord, 'euclidean')
Using your own custom sqrt sum sqaures is not always safe, they can overflow or underflow.使用您自己的自定义 sqrt sum sqaures 并不总是安全的,它们可能会溢出或下溢。 Speed wise they are same
速度方面他们是一样的
np.hypot(
np.subtract.outer(x, x),
np.subtract.outer(y, y)
)
i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0
i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf
i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200
i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.