Python：比較兩個數組的元素

Question

如果坐標之間的歐幾里得距離小於1並且時間相同，我想比較兩個numpy數組的元素並刪除這些數組之一的元素。 data_CD4和data_CD8是數組。 數組的元素是具有3D坐標的列表，時間是第4個元素（numpy.array（[[x，y，z，time]，[x，y，z，time] ........]）。是臨界值，這里是1。

for i in data_CD8:
        for m in data_CD4:
            if distance.euclidean(tuple(i[:3]),tuple(m[:3])) < co and i[3]==m[3] :
                data_CD8=np.delete(data_CD8, i, 0)

有沒有更快的方法可以做到這一點？ 第一個數組包含5000個元素，第二個數組包含2000個元素，因此花費了太多時間。

Answer 1

這應該是向量化方法。

mask1 = np.sum((data_CD4[:, None, :3] - data_CD8[None, :, :3])**2, axis = -1) < co**2
mask2 = data_CD4[:, None, 3] == data_CD8[None, :, 3]
mask3 = np.any(np.logical_and(mask1, mask2), axis = 0)
data_CD8 = data_CD8[~mask3]

mask1應該加快距離計算的速度，因為它不需要平方根調用。 mask1和mask2是二維數組，我們將其壓縮到np.any到1d。在末尾進行所有刪除操作會避免大量的讀/寫操作。

速度測試：

a = np.random.randint(0, 10, (100, 3))

b = np.random.randint(0, 10, (100, 3))

%timeit cdist(a,b) < 5  #Divakar's answer
10000 loops, best of 3: 133 µs per loop

%timeit np.sum((a[None, :, :] - b[:, None, :]) ** 2, axis = -1) < 25  # My answer
1000 loops, best of 3: 418 µs per loop

即使添加了不必要的平方根，C編譯的代碼也會獲勝。

Answer 2

這是使用Scipy's cdist的矢量化方法-

from scipy.spatial import distance

# Get eucliden distances between first three cols off data_CD8 and data_CD4
dists = distance.cdist(data_CD8[:,:3], data_CD4[:,:3])

# Get mask of those distances that are within co distance. This sets up the 
# first condition requirement as posted in the loopy version of original code.
mask1 = dists < co

# Take the third column off the two input arrays that represent the time values.
# Get the equality between all time values off data_CD8 against all time values
# off data_CD4. This sets up the second conditional requirement.
# We are adding a new axis with None, so that NumPY broadcasting
# would let us do these comparisons in a vectorized manner.
mask2 = data_CD8[:,3,None] == data_CD4[:,3]

# Combine those two masks and look for any match correponding to any 
# element off data_CD4. Since the masks are setup such that second axis
# represents data_CD4, we need numpy.any along axis=1 on the combined mask.
# A final inversion of mask is needed as we are deleting the ones that 
# satisfy these requirements.
mask3 = ~((mask1 & mask2).any(1))

# Finally, using boolean indexing to select the valid rows off data_CD8
out = data_CD8[mask3]

Answer 3

如果在從data_CD8刪除數據時必須將data_CD4中的所有項目與data_CD4中的項目進行data_CD8則最好在每次迭代時減小第二個可迭代項，這當然取決於您最常見的情況。

for m in data_CD4:
    for i in data_CD8:
        if distance.euclidean(tuple(i[3:]),tuple(m[3:])) < co and i[3]==m[3] :
            data_CD8 = np.delete(data_CD8, i, 0)

基於大O表示法-由於這是O(n^2) -我看不到更快的解決方案。

Python：比較兩個數組的元素

問題描述

3 個解決方案

解決方案1
2 2017-04-04 12:14:41

解決方案2
2 已采納 2017-04-04 12:25:23

解決方案3
0 2017-04-04 12:02:02

Python：比較兩個數組的元素

問題描述

3 個解決方案

解決方案1 2 2017-04-04 12:14:41

解決方案2 2 已采納 2017-04-04 12:25:23

解決方案3 0 2017-04-04 12:02:02

解決方案1
2 2017-04-04 12:14:41

解決方案2
2 已采納 2017-04-04 12:25:23

解決方案3
0 2017-04-04 12:02:02