![](/img/trans.png)
[英]Minimum Euclidean distance between points in two different Numpy arrays, not within
[英]Fast weighted euclidean distance between points in arrays
我需要有效地計算每個歐幾里得加權距離x,y
的給定陣列中的點到每個其它x,y
在另一個陣列點。 這是我的代碼按預期工作:
import numpy as np
import random
def rand_data(integ):
'''
Function that generates 'integ' random values between [0.,1.)
'''
rand_dat = [random.random() for _ in range(integ)]
return rand_dat
def weighted_dist(indx, x_coo, y_coo):
'''
Function that calculates *weighted* euclidean distances.
'''
dist_point_list = []
# Iterate through every point in array_2.
for indx2, x_coo2 in enumerate(array_2[0]):
y_coo2 = array_2[1][indx2]
# Weighted distance in x.
x_dist_weight = (x_coo-x_coo2)/w_data[0][indx]
# Weighted distance in y.
y_dist_weight = (y_coo-y_coo2)/w_data[1][indx]
# Weighted distance between point from array_1 passed and this point
# from array_2.
dist = np.sqrt(x_dist_weight**2 + y_dist_weight**2)
# Append weighted distance value to list.
dist_point_list.append(round(dist, 8))
return dist_point_list
# Generate random x,y data points.
array_1 = np.array([rand_data(10), rand_data(10)], dtype=float)
# Generate weights for each x,y coord for points in array_1.
w_data = np.array([rand_data(10), rand_data(10)], dtype=float)
# Generate second larger array.
array_2 = np.array([rand_data(100), rand_data(100)], dtype=float)
# Obtain *weighted* distances for every point in array_1 to every point in array_2.
dist = []
# Iterate through every point in array_1.
for indx, x_coo in enumerate(array_1[0]):
y_coo = array_1[1][indx]
# Call function to get weighted distances for this point to every point in
# array_2.
dist.append(weighted_dist(indx, x_coo, y_coo))
最終列表dist
包含與第一個數組中的點一樣多的子列表,其中每個子元素的數量與第二個中的點數相同(加權距離)。
我想知道是否有辦法使這個代碼更有效率,也許使用cdist函數,因為當數組有很多元素(在我的情況下它們有)時,當我需要檢查時,這個過程變得相當昂貴許多陣列的距離(我也有)
@Evan和@Martinis Group走在正確的軌道上 - 擴展Evan的答案,這是一個使用廣播來快速計算沒有Python循環的n維加權歐氏距離的函數:
import numpy as np
def fast_wdist(A, B, W):
"""
Compute the weighted euclidean distance between two arrays of points:
D{i,j} =
sqrt( ((A{0,i}-B{0,j})/W{0,i})^2 + ... + ((A{k,i}-B{k,j})/W{k,i})^2 )
inputs:
A is an (k, m) array of coordinates
B is an (k, n) array of coordinates
W is an (k, m) array of weights
returns:
D is an (m, n) array of weighted euclidean distances
"""
# compute the differences and apply the weights in one go using
# broadcasting jujitsu. the result is (n, k, m)
wdiff = (A[np.newaxis,...] - B[np.newaxis,...].T) / W[np.newaxis,...]
# square and sum over the second axis, take the sqrt and transpose. the
# result is an (m, n) array of weighted euclidean distances
D = np.sqrt((wdiff*wdiff).sum(1)).T
return D
要檢查這是否正常,我們將它與使用嵌套Python循環的較慢版本進行比較:
def slow_wdist(A, B, W):
k,m = A.shape
_,n = B.shape
D = np.zeros((m, n))
for ii in xrange(m):
for jj in xrange(n):
wdiff = (A[:,ii] - B[:,jj]) / W[:,ii]
D[ii,jj] = np.sqrt((wdiff**2).sum())
return D
首先,讓我們確保兩個函數給出相同的答案:
# make some random points and weights
def setup(k=2, m=100, n=300):
return np.random.randn(k,m), np.random.randn(k,n),np.random.randn(k,m)
a, b, w = setup()
d0 = slow_wdist(a, b, w)
d1 = fast_wdist(a, b, w)
print np.allclose(d0, d1)
# True
不用說,使用廣播而不是Python循環的版本要快幾個數量級:
%%timeit a, b, w = setup()
slow_wdist(a, b, w)
# 1 loops, best of 3: 647 ms per loop
%%timeit a, b, w = setup()
fast_wdist(a, b, w)
# 1000 loops, best of 3: 620 us per loop
如果您不需要加權距離,可以使用cdist
。 如果您需要加權距離和性能,請創建適當輸出大小的數組,並使用Numba或Parakeet等自動加速器,或使用Cython手動調整代碼。
您可以使用類似於以下內容的代碼來避免循環:
def compute_distances(A, B, W):
Ax = A[:,0].reshape(1, A.shape[0])
Bx = B[:,0].reshape(A.shape[0], 1)
dx = Bx-Ax
# Same for dy
dist = np.sqrt(dx**2 + dy**2) * W
return dist
這將在Python運行速度快了很多那個什么,只要你有數組足夠的內存循環。
您可以嘗試刪除平方根,因為如果a> b,則遵循平方> b平方...並且計算機通常在平方根處真的很慢。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.