简体   繁体   English

用numpy计算欧几里德距离

[英]Calculate euclidean distance with numpy

I have a point set which I have stored its coordinates in three different arrays (xa, ya, za).我有一个点集,我将它的坐标存储在三个不同的数组(xa、ya、za)中。 Now, I want to calculate the euclidean distance between each point of this point set (xa[0], ya[0], za[0] and so on) with all the points of an another point set (xb, yb, zb) and every time store the minimum distance in a new array.现在,我想计算该点集(xa[0]、ya[0]、za[0] 等)的每个点与另一个点集(xb、yb、zb)的所有点之间的欧几里德距离) 并且每次都将最小距离存储在一个新数组中。

Let's say that xa.shape = (11,), ya.shape = (11,), za.shape= (11,).假设 xa.shape = (11,), ya.shape = (11,), za.shape= (11,)。 Respectively, xb.shape = (13,), yb.shape = (13,), zb.shape = (13,).分别是xb.shape = (13,), yb.shape = (13,), zb.shape = (13,)。 What I want to do is to take each time one xa[],ya[],za[], and calculate its distance with all the elements of xb, yb, zb, and at the end store the minimum value into an xfinal.shape = (11,) array.我想要做的是每次取一个xa[],ya[],za[],并计算它与xb、yb、zb的所有元素的距离,最后将最小值存入xfinal。形状 = (11,) 数组。

Do you think that this would be possible with numpy?你认为这可以用 numpy 实现吗?

A different solution would be to use the spatial module from scipy, the KDTree in particular.一个不同的解决方案是使用 scipy 的空间模块,特别是 KDTree。

This class learn from a set of data and can be interrogated given a new dataset:这个类从一组数据中学习,并且可以在给定一个新数据集的情况下进行查询:

from scipy.spatial import KDTree
# create some fake data
x = arange(20)
y = rand(20)
z = x**2
# put them togheter, should have a form [n_points, n_dimension]
data = np.vstack([x, y, z]).T
# create the KDTree
kd = KDTree(data)

now if you have a point you can ask the distance and the index of the closet point (or the N closest points) simply by doing:现在,如果您有一个点,您可以通过执行以下操作来询问最近点(或 N 个最近点)的距离和索引:

kd.query([1, 2, 3])
# (1.8650720813822905, 2)
# your may differs

or, given an array of positions:或者,给定一组位置:

#bogus position
x2 = rand(20)*20
y2 = rand(20)*20
z2 = rand(20)*20
# join them togheter as the input
data2 = np.vstack([x2, y2, z2]).T
#query them
kd.query(data2)

#(array([ 14.96118553,   9.15924813,  16.08269197,  21.50037074,
#    18.14665096,  13.81840533,  17.464429  ,  13.29368755,
#    20.22427196,   9.95286671,   5.326888  ,  17.00112683,
#     3.66931946,  20.370496  ,  13.4808055 ,  11.92078034,
#     5.58668204,  20.20004206,   5.41354322,   4.25145521]),
#array([4, 3, 2, 4, 2, 2, 4, 2, 3, 3, 2, 3, 4, 4, 3, 3, 3, 4, 4, 4]))

You can calculate the difference from each xa to each xb with np.subtract.outer(xa, xb) .您可以使用np.subtract.outer(xa, xb)计算每个 xa 到每个 xb 的差异。 The distance to the nearest xb is given by到最近的 xb 的距离由下式给出

np.min(np.abs(np.subtract.outer(xa, xb)), axis=1)

To extend this to 3D,要将其扩展到 3D,

distances = np.sqrt(np.subtract.outer(xa, xb)**2 + \
    np.subtract.outer(ya, yb)**2 + np.subtract.outer(za, zb)**2)
distance_to_nearest = np.min(distances, axis=1)

If you actually want to know which of the b points is the nearest, you use argmin in place of min .如果您真的想知道哪个b 点最近,您可以使用argmin代替min

index_of_nearest = np.argmin(distances, axis=1)

There is more than one way of doing this.有不止一种方法可以做到这一点。 Most importantly, there's a trade-off between memory-usage and speed.最重要的是,内存使用和速度之间存在权衡。 Here's the wasteful method:这是浪费的方法:

s = (1, -1)
d = min((xa.reshape(s)-xb.reshape(s).T)**2
     + (ya.reshape(s)-yb.reshape(s).T)**2
     + (za.reshape(s)-zb.reshape(s).T)**2), axis=0)

The other method would be to iterate over the point set in b to avoid the expansion to the full blown matrix.另一种方法是迭代b的点集,以避免扩展到完整的矩阵。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM