[英]Fastest way to compute distances between consecutive vectors with numpy/scipy
I have a line growth algorithm where I need to:我有一个线增长算法,我需要:
I usually do this in a very naive manner (see code below) and would like to know how to compute distances between consecutive vectors the fastest way possible using numpy (and scipy if needed).我通常以非常幼稚的方式执行此操作(请参见下面的代码),并且想知道如何使用 numpy(如果需要,还可以使用 scipy)以最快的方式计算连续向量之间的距离。
import math
threshold = 10
vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]
for i in xrange(len(vectorList)):
p1 = vectorList[i]
p2 = vectorList[i+1]
d = math.sqrt((p2[0] - p1[0])**2 + (p2[1] - p1[1])**2)
if d >= threshold:
pmid = ((p1[0] + p2[0]) * .5, (p1[1] + p2[1]) * .5)
vectorList.insert(i+1, pmid)
EDIT: I've come up with the following workaround but I'm still concerned with the distance computation.编辑:我想出了以下解决方法,但我仍然关心距离计算。
I would need to compute the distances between a vector and its next neighbor in a list instead of calculating a whole distance matrix (all vectors against each others) as I'm doing here.我需要计算一个向量与其在列表中的下一个邻居之间的距离,而不是像我在这里所做的那样计算整个距离矩阵(所有向量相互之间)。
import numpy as np
vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]
arr = np.asarray(vectorList).astype(float)
dis = distance.cdist(arr, arr).diagonal(1)
idx = np.where(dis > 10)[0]
vec = (arr[idx] + arr[idx+1]) * .5
arr = np.insert(arr, idx+1, vec, 0)
# output
array([[ 0. , 10. ],[ 4. , 8. ],[ 9. , 11. ],[14. , 14. ],[16. , 19. ],[25.5, 17.5],[35. , 16. ]])
In [209]: vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16), (39,50)]
In [210]: vectorList
Out[210]: [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16), (39, 50)]
I added a point, making 3 possible insertion points.我添加了一个点,使 3 个可能的插入点。
In [212]: np.diff(vectorList, axis=0)
Out[212]:
array([[ 4, -2],
[10, 6],
[ 2, 5],
[19, -3],
[ 4, 34]])
In [213]: np.sum(np.diff(vectorList, axis=0)**2,1)
Out[213]: array([ 20, 136, 29, 370, 1172])
distances:距离:
In [214]: np.sqrt(np.sum(np.diff(vectorList, axis=0)**2,1))
Out[214]: array([ 4.47213595, 11.66190379, 5.38516481, 19.23538406, 34.23448554])
Mean values:平均值:
In [216]: arr = np.array(vectorList)
In [217]: arr
Out[217]:
array([[ 0, 10],
[ 4, 8],
[14, 14],
[16, 19],
[35, 16],
[39, 50]])
In [218]: (arr[:-1]+arr[1:])/2
Out[218]:
array([[ 2. , 9. ],
[ 9. , 11. ],
[15. , 16.5],
[25.5, 17.5],
[37. , 33. ]])
I could do something similar without diff
:我可以在没有diff
的情况下做类似的事情:
d = np.sqrt(np.sum((arr[1:]-arr[:-1])**2,1))
The jumps that exceed the threshhold:超过阈值的跳跃:
In [224]: idx = np.nonzero(d>10)
In [225]: idx
Out[225]: (array([1, 3, 4]),)
In [227]: _218[idx] # the mean values to insert
Out[227]:
array([[ 9. , 11. ],
[25.5, 17.5],
[37. , 33. ]])
Use np.insert
to insert all values.使用np.insert
插入所有值。
In [232]: np.insert(arr.astype(float), idx[0]+1, _227, axis=0)
Out[232]:
array([[ 0. , 10. ],
[ 4. , 8. ],
[ 9. , 11. ],
[14. , 14. ],
[16. , 19. ],
[25.5, 17.5],
[35. , 16. ],
[37. , 33. ],
[39. , 50. ]])
Here is an approach using sklearn NearestNeighbors这是一种使用 sklearn NearestNeighbors的方法
from sklearn.neighbors import NearestNeighbors
import numpy as np
def distance(p1,p2):
return np.sqrt(np.sum(np.diff([p1,p2], axis=0)**2,1))
def find_shortest_distance(vectorList):
nbrs = NearestNeighbors(n_neighbors=2, metric=distance).fit(vectorList)
distances, indices = nbrs.kneighbors(vectorList)
result = distances[:, 1]
return result
vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]
which returns:返回:
result
Out[57]: array([ 4.47213595, 4.47213595, 5.38516481, 5.38516481, 19.23538406])
This might be overkill for your problem, but this solution doesn't require the points to be in consecutive order.对于您的问题,这可能有点矫枉过正,但此解决方案不要求这些点按连续顺序排列。
Timings (slower than numpy):计时(比 numpy 慢):
%timeit find_shortest_distance(vectorList)
478 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.