用 numpy/scipy 计算连续向量之间距离的最快方法

Question

I have a line growth algorithm where I need to:我有一个线增长算法，我需要：

calculate the distances (euclidean) between consecutive vectors in an array计算数组中连续向量之间的距离（欧几里得）
insert a new vector where a distance is greater than a specific threshold在距离大于特定阈值的地方插入一个新向量

I usually do this in a very naive manner (see code below) and would like to know how to compute distances between consecutive vectors the fastest way possible using numpy (and scipy if needed).我通常以非常幼稚的方式执行此操作（请参见下面的代码），并且想知道如何使用 numpy（如果需要，还可以使用 scipy）以最快的方式计算连续向量之间的距离。

import math


threshold = 10
vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]

for i in xrange(len(vectorList)):
    p1 = vectorList[i]
    p2 = vectorList[i+1]
    d = math.sqrt((p2[0] - p1[0])**2 + (p2[1] - p1[1])**2)
    if d >= threshold:
        pmid = ((p1[0] + p2[0]) * .5, (p1[1] + p2[1]) * .5)
        vectorList.insert(i+1, pmid)

EDIT: I've come up with the following workaround but I'm still concerned with the distance computation.编辑：我想出了以下解决方法，但我仍然关心距离计算。

I would need to compute the distances between a vector and its next neighbor in a list instead of calculating a whole distance matrix (all vectors against each others) as I'm doing here.我需要计算一个向量与其在列表中的下一个邻居之间的距离，而不是像我在这里所做的那样计算整个距离矩阵（所有向量相互之间）。

import numpy as np

vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]
arr = np.asarray(vectorList).astype(float)


dis = distance.cdist(arr, arr).diagonal(1)
idx = np.where(dis > 10)[0]
vec = (arr[idx] + arr[idx+1]) * .5
arr = np.insert(arr, idx+1, vec, 0)

# output
array([[ 0. , 10. ],[ 4. ,  8. ],[ 9. , 11. ],[14. , 14. ],[16. , 19. ],[25.5, 17.5],[35. , 16. ]])

Answer 1

In [209]: vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16), (39,50)] 
In [210]: vectorList                                                            
Out[210]: [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16), (39, 50)]

I added a point, making 3 possible insertion points.我添加了一个点，使 3 个可能的插入点。

In [212]: np.diff(vectorList, axis=0)                                           
Out[212]: 
array([[ 4, -2],
       [10,  6],
       [ 2,  5],
       [19, -3],
       [ 4, 34]])
In [213]: np.sum(np.diff(vectorList, axis=0)**2,1)                              
Out[213]: array([  20,  136,   29,  370, 1172])

distances:距离：

In [214]: np.sqrt(np.sum(np.diff(vectorList, axis=0)**2,1))                     
Out[214]: array([ 4.47213595, 11.66190379,  5.38516481, 19.23538406, 34.23448554])

Mean values:平均值：

In [216]: arr = np.array(vectorList)                                            
In [217]: arr                                                                   
Out[217]: 
array([[ 0, 10],
       [ 4,  8],
       [14, 14],
       [16, 19],
       [35, 16],
       [39, 50]])

In [218]: (arr[:-1]+arr[1:])/2                                                  
Out[218]: 
array([[ 2. ,  9. ],
       [ 9. , 11. ],
       [15. , 16.5],
       [25.5, 17.5],
       [37. , 33. ]])

I could do something similar without diff :我可以在没有diff的情况下做类似的事情：

d = np.sqrt(np.sum((arr[1:]-arr[:-1])**2,1))

The jumps that exceed the threshhold:超过阈值的跳跃：

In [224]: idx = np.nonzero(d>10)                                                
In [225]: idx                                                                   
Out[225]: (array([1, 3, 4]),)
In [227]: _218[idx]       # the mean values to insert                                                            
Out[227]: 
array([[ 9. , 11. ],
       [25.5, 17.5],
       [37. , 33. ]])

Use np.insert to insert all values.使用np.insert插入所有值。

In [232]: np.insert(arr.astype(float), idx[0]+1, _227, axis=0)                  
Out[232]: 
array([[ 0. , 10. ],
       [ 4. ,  8. ],
       [ 9. , 11. ],
       [14. , 14. ],
       [16. , 19. ],
       [25.5, 17.5],
       [35. , 16. ],
       [37. , 33. ],
       [39. , 50. ]])

Answer 2

Here is an approach using sklearn NearestNeighbors这是一种使用 sklearn NearestNeighbors的方法

from sklearn.neighbors import NearestNeighbors
import numpy as np

def distance(p1,p2):
    return np.sqrt(np.sum(np.diff([p1,p2], axis=0)**2,1))

def find_shortest_distance(vectorList):
    nbrs = NearestNeighbors(n_neighbors=2, metric=distance).fit(vectorList)
    distances, indices = nbrs.kneighbors(vectorList)

    result = distances[:, 1]
    return result

vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]

which returns:返回：

result
Out[57]: array([ 4.47213595,  4.47213595,  5.38516481,  5.38516481, 19.23538406])

This might be overkill for your problem, but this solution doesn't require the points to be in consecutive order.对于您的问题，这可能有点矫枉过正，但此解决方案不要求这些点按连续顺序排列。

Timings (slower than numpy):计时（比 numpy 慢）：

%timeit find_shortest_distance(vectorList)
478 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

用 numpy/scipy 计算连续向量之间距离的最快方法

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-10-22 15:49:30

解决方案2
0 2019-10-22 16:25:12

用 numpy/scipy 计算连续向量之间距离的最快方法

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-10-22 15:49:30

解决方案2 0 2019-10-22 16:25:12

解决方案1
2 已采纳 2019-10-22 15:49:30

解决方案2
0 2019-10-22 16:25:12