在三個數組中找到最接近的三個x，y點

Question

在Python中，我有三個包含x和y坐標的列表。 每個列表包含128點。 如何有效地找到最接近的三個點？

這是我工作的python代碼，但效率不夠：

   def findclosest(c1, c2, c3):
       mina = 999999999
       for i in c1:
          for j in c2:
             for k in c3:
                # calculate sum of distances between points
                d = xy3dist(i,j,k)
                if d < mina:
                   mina = d

    def xy3dist(a, b, c):
       l1 = math.sqrt((a[0]-b[0]) ** 2 + (a[1]-b[1]) ** 2 )   
       l2 = math.sqrt((b[0]-c[0]) ** 2 + (b[1]-c[1]) ** 2 )   
       l3 = math.sqrt((a[0]-c[0]) ** 2 + (a[1]-c[1]) ** 2 )       
       return l1+l2+l3

知道如何使用numpy做到嗎？

Answer 1

您可以使用Numpy的廣播功能對兩個內部循環進行矢量化處理：


import numpy as np

def findclosest(c1, c2, c3):
   c1 = np.asarray(c1)
   c2 = np.asarray(c2)
   c3 = np.asarray(c3)

   for arr in (c1, c2, c3):
       if not (arr.ndim == 2 and arr.shape[1] == 2):
           raise ValueError("expected arrays of 2D coordinates")

   min_val = np.inf
   min_pos = None

   for a, i in enumerate(c1):
       d = xy3dist(i, c2.T[:,:,np.newaxis], c3.T[:,np.newaxis,:])
       k = np.argmin(d)

       if d.flat[k] < min_val:
           min_val = d.flat[k]
           b, c = np.unravel_index(k, d.shape)
           min_pos = (a, b, c)

       print a, min_val, d.min()

   return min_val, min_pos

def xy3dist(a, b, c):
   l1 = np.sqrt((a[0]-b[0]) ** 2 + (a[1]-b[1]) ** 2 )   
   l2 = np.sqrt((b[0]-c[0]) ** 2 + (b[1]-c[1]) ** 2 )   
   l3 = np.sqrt((a[0]-c[0]) ** 2 + (a[1]-c[1]) ** 2 )       
   return l1+l2+l3

np.random.seed(1234)
c1 = np.random.rand(5, 2)
c2 = np.random.rand(9, 2)
c3 = np.random.rand(7, 2)

val, pos = findclosest(c1, c2, c3)

a, b, c = pos
print val, xy3dist(c1[a], c2[b], c3[c])

也可以向量化所有3個循環


def findclosest2(c1, c2, c3):
    c1 = np.asarray(c1)
    c2 = np.asarray(c2)
    c3 = np.asarray(c3)
    d = xy3dist(c1.T[:,:,np.newaxis,np.newaxis], c2.T[:,np.newaxis,:,np.newaxis], c3.T[:,np.newaxis,np.newaxis,:])
    k = np.argmin(d)
    min_val = d.flat[k]
    a, b, c = np.unravel_index(k, d.shape)
    min_pos = (a, b, c)
    return min_val, min_pos

如果數組很大，則findclosest可能比findclosest2更好，因為它使用的內存更少。 （如果數組很大，則僅矢量化最里面的一個循環。）

您可以通過Google搜索“ numpy廣播”來了解np.newaxis的更多功能

Answer 2

讓我們嘗試確定一些不同的解決方案的時間。

我將使用numpy的隨機函數初始化三個數組。 如果現有變量是元組列表或列表列表，則只需在它們上調用np.array即可。

import numpy as np

c1 = np.random.normal(size=(128, 2))
c2 = np.random.normal(size=(128, 2))
c3 = np.random.normal(size=(128, 2))

首先，讓我們為您的代碼計時，以便我們有一個起點。

def findclosest(c1, c2, c3):
    mina = 999999999
    for i in c1:
        for j in c2:
            for k in c3:
                 # calculate sum of distances between points
                 d = xy3dist(i,j,k)
                 if d < mina:
                     mina = d
    return mina

def xy3dist(a, b, c):
     l1 = math.sqrt((a[0]-b[0]) ** 2 + (a[1]-b[1]) ** 2 )   
     l2 = math.sqrt((b[0]-c[0]) ** 2 + (b[1]-c[1]) ** 2 )   
     l3 = math.sqrt((a[0]-c[0]) ** 2 + (a[1]-c[1]) ** 2 )       
     return l1+l2+l3

%timeit findclosest(c1, c2, c3)
# 1 loops, best of 3: 23.3 s per loop

可能有用的一個函數是scipy.spatial.distance.cdist ，它計算兩個點陣列之間的所有成對距離。 因此，我們可以使用它來預先計算和存儲所有距離，然后簡單地從這些數組獲取並添加距離。 我也將使用itertools.product簡化循環，盡管它不會做任何加速工作。

from scipy.spatial.distance import cdist
from itertools import product

def findclosest_usingcdist(c1, c2, c3):
    dists_12 = cdist(c1, c2)
    dists_23 = cdist(c2, c3)
    dists_13 = cdist(c1, c3)

    min_dist = np.inf
    ind_gen = product(range(len(c1)), range(len(c2)), range(len(c3)))
    for i1, i2, i3 in ind_gen:
        dist = dists_12[i1, i2] + dists_23[i2, i3] + dists_13[i1, i3]
        if dist < min_dist:
            min_dist = dist
            min_points = (c1[i1], c2[i2], c3[i3])

    return min_dist, min_points

%timeit findclosest_usingcdist(c1, c2, c3)
# 1 loops, best of 3: 2.02 s per loop

因此，使用cdist我們獲得一個數量級的加速。

但是，這甚至無法與@pv的答案相提並論。 剝離了他的一個實現，其中包含一些東西，可以更好地與以前的解決方案進行比較（返回點的實現請參見@pv的答案）。

def findclosest2(c1, c2, c3):
    d = xy3dist(c1.T[:,:,np.newaxis,np.newaxis], 
                c2.T[:,np.newaxis,:,np.newaxis], 
                c3.T[:,np.newaxis,np.newaxis,:])
    k = np.argmin(d)
    min_val = d.flat[k]
    i1, i2, i3 = np.unravel_index(k, d.shape)
    min_points = (c1[i1], c2[i2], c3[i3])
    return min_val, min_points 

def xy3dist(a, b, c):
    l1 = np.sqrt((a[0]-b[0]) ** 2 + (a[1]-b[1]) ** 2 )   
    l2 = np.sqrt((b[0]-c[0]) ** 2 + (b[1]-c[1]) ** 2 )   
    l3 = np.sqrt((a[0]-c[0]) ** 2 + (a[1]-c[1]) ** 2 )       
    return l1+l2+l3

%timeit findclosest_usingbroadcasting(c1, c2, c3)
# 100 loops, best of 3: 19.1 ms per loop

因此，這是巨大的提速，並且絕對是正確的答案。

在三個數組中找到最接近的三個x，y點

問題描述

2 個解決方案

解決方案1
3 已采納 2014-09-19 15:59:16

解決方案2
2 2014-09-19 16:01:05

在三個數組中找到最接近的三個x，y點

問題描述

2 個解決方案

解決方案1 3 已采納 2014-09-19 15:59:16

解決方案2 2 2014-09-19 16:01:05

解決方案1
3 已采納 2014-09-19 15:59:16

解決方案2
2 2014-09-19 16:01:05