根據其他數組中的存在/不存在從numpy數組中刪除行

Question

我有3個不同的numpy數組，但它們都以兩列開頭，其中包含年份和時間。 例如：

   dyn = [[  83   12   7.10555687e-01 ...,   6.99242766e-01   6.868761e-01]
         [  83   13   8.28091972e-01 ...,   8.33734118e-01   8.47266838e-01]
         [  83   14   8.79437354e-01 ...,   8.73598144e-01   8.57156213e-01]
         [  161   23   3.28109488e-01 ...,   2.83043689e-01  2.59775391e-01]
         [  162   0    2.23502046e-01 ...,   1.96972086e-01  1.65565263e-01]
         [  162   1   2.51653976e-01 ...,   2.17209188e-01   1.42133495e-1]]

   us = [[  133   18   3.00483815e+02 ...,   1.94277561e+00   2.8168959e+00]
        [  133   19   2.98832620e+02 ...,   2.42506475e+00   2.99730800e+00]
        [  133   20   2.96706105e+02 ...,   3.16851622e+00   4.41187088e+00]
        [  161   23   2.88336560e+02 ...,   3.44864070e-01   3.85055635e-01]
        [  162   0    2.87593240e+02 ...,   2.93002410e-01   2.67112490e-01]
        [  162   2    2.86992180e+02 ...,   7.08996730e-02   2.6403210e-01]]

我需要能夠刪除所有3個數組中都沒有特定日期和時間的行。 換句話說，所以我剩下3個數組，在這3個數組中，前2列相同。

因此，得到的較小數組將是：

dyn= [[  161   23   3.28109488e-01 ...,   2.83043689e-01  2.59775391e-01]
     [  162   0    2.23502046e-01 ...,   1.96972086e-01  1.65565263e-01]]

us= [[  161   23   2.88336560e+02 ...,   3.44864070e-01   3.85055635e-01]
    [  162   0    2.87593240e+02 ...,   2.93002410e-01   2.67112490e-01]]

（但隨后也受第三個數組的限制）

我試過使用sort / zip，但不確定將其應用於2D數組，例如：

X= dyn
Y = us
xsorted=[x for (y,x) in sorted(zip(Y[:,1],X[:,1]), key=lambda pair: pair[0])]

還有一個循環，但是僅當陣列中相同的時間/日期位於相同的位置時才有效，這沒有幫助

for i in range(100):
     dyn_small=dyn[dyn[:,0]==us[i,0]]

Answer 1

假設A ， B和C為輸入數組，這是一種矢量化方法，大量使用broadcasting -

# Get masks comparing all rows of A with B and then B with C
M1 = (A[:,None,:2] == B[:,:2])
M2 = (B[:,None,:2] == C[:,:2])

# Get a joint 3D mask of those two masks and get the indices of matches.
# These indices (I,J,K) of the 3D mask basically tells us the row numbers 
# correspondng to each of the input arrays that are present in all of them.
# Thus, in (I,J,K), I would be the matching row number in A, J in B & K in C.
I,J,K = np.where((M1[:,:,None,:] & M2).all(3))

# Finally, select rows of A, B and C with I, J and K respectively
A_new = A[I]
B_new = B[J]
C_new = C[K]

樣品運行-

1）輸入：

In [116]: A
Out[116]: 
array([[ 83,  12, 443],
       [ 83,  13, 565],
       [ 83,  14, 342],
       [161,  23, 431],
       [162,   0, 113],
       [162,   1, 313]])

In [117]: B
Out[117]: 
array([[161,  23, 999],
       [  5,   1,  13],
       [ 83,  12,  15],
       [162,   0,  12],
       [  4,   3,  11]])

In [118]: C
Out[118]: 
array([[ 11,  23, 143],
       [162,   0, 113],
       [161,  23, 545]])

2）運行解決方案代碼以獲取匹配的行ID，從而提取行：

In [119]: M1 = (A[:,None,:2] == B[:,:2])
     ...: M2 = (B[:,None,:2] == C[:,:2])
     ...: 

In [120]: I,J,K = np.where((M1[:,:,None,:] & M2).all(3))

In [121]: A[I]
Out[121]: 
array([[161,  23, 431],
       [162,   0, 113]])

In [122]: B[J]
Out[122]: 
array([[161,  23, 999],
       [162,   0,  12]])

In [123]: C[K]
Out[123]: 
array([[161,  23, 545],
       [162,   0, 113]])

Answer 2

numpy_indexed軟件包（免責聲明：我是它的作者）包含以優雅，有效/矢量化的方式解決此類問題的功能：

import numpy as np
import numpy_indexed as npi

dyn = np.array(dyn)
us = np.array(us)

dyn_index = npi.as_index(dyn[:, :2])
us_index = npi.as_index(us[:, :2])

common = npi.intersection(dyn_index, us_index)
print(common)
print(dyn[npi.contains(common, dyn_index)])
print(us[npi.contains(common, us_index)])

注意，性能NlogN最壞的情況； 和as_index的參數已經按照排序順序排列。 相反，當前接受的答案在輸入大小上是二次方的。

根據其他數組中的存在/不存在從numpy數組中刪除行

問題描述

2 個解決方案

解決方案1
0 已采納 2016-01-22 18:01:40

解決方案2
0 2016-04-29 13:23:29

根據其他數組中的存在/不存在從numpy數組中刪除行

問題描述

2 個解決方案

解決方案1 0 已采納 2016-01-22 18:01:40

解決方案2 0 2016-04-29 13:23:29

解決方案1
0 已采納 2016-01-22 18:01:40

解決方案2
0 2016-04-29 13:23:29