簡體   English   中英

根據其他數組中的存在/不存在從numpy數組中刪除行

[英]Remove rows from numpy array based on presence/absence in other arrays

我有3個不同的numpy數組,但它們都以兩列開頭,其中包含年份和時間。 例如:

   dyn = [[  83   12   7.10555687e-01 ...,   6.99242766e-01   6.868761e-01]
         [  83   13   8.28091972e-01 ...,   8.33734118e-01   8.47266838e-01]
         [  83   14   8.79437354e-01 ...,   8.73598144e-01   8.57156213e-01]
         [  161   23   3.28109488e-01 ...,   2.83043689e-01  2.59775391e-01]
         [  162   0    2.23502046e-01 ...,   1.96972086e-01  1.65565263e-01]
         [  162   1   2.51653976e-01 ...,   2.17209188e-01   1.42133495e-1]]

   us = [[  133   18   3.00483815e+02 ...,   1.94277561e+00   2.8168959e+00]
        [  133   19   2.98832620e+02 ...,   2.42506475e+00   2.99730800e+00]
        [  133   20   2.96706105e+02 ...,   3.16851622e+00   4.41187088e+00]
        [  161   23   2.88336560e+02 ...,   3.44864070e-01   3.85055635e-01]
        [  162   0    2.87593240e+02 ...,   2.93002410e-01   2.67112490e-01]
        [  162   2    2.86992180e+02 ...,   7.08996730e-02   2.6403210e-01]]

我需要能夠刪除所有3個數組中都沒有特定日期和時間的行。 換句話說,所以我剩下3個數組,在這3個數組中,前2列相同。

因此,得到的較小數組將是:

dyn= [[  161   23   3.28109488e-01 ...,   2.83043689e-01  2.59775391e-01]
     [  162   0    2.23502046e-01 ...,   1.96972086e-01  1.65565263e-01]]

us= [[  161   23   2.88336560e+02 ...,   3.44864070e-01   3.85055635e-01]
    [  162   0    2.87593240e+02 ...,   2.93002410e-01   2.67112490e-01]]

(但隨后也受第三個數組的限制)

我試過使用sort / zip,但不確定將其應用於2D數組,例如:

X= dyn
Y = us
xsorted=[x for (y,x) in sorted(zip(Y[:,1],X[:,1]), key=lambda pair: pair[0])]

還有一個循環,但是僅當陣列中相同的時間/日期位於相同的位置時才有效,這沒有幫助

for i in range(100):
     dyn_small=dyn[dyn[:,0]==us[i,0]]

假設ABC為輸入數組,這是一種矢量化方法,大量使用broadcasting -

# Get masks comparing all rows of A with B and then B with C
M1 = (A[:,None,:2] == B[:,:2])
M2 = (B[:,None,:2] == C[:,:2])

# Get a joint 3D mask of those two masks and get the indices of matches.
# These indices (I,J,K) of the 3D mask basically tells us the row numbers 
# correspondng to each of the input arrays that are present in all of them.
# Thus, in (I,J,K), I would be the matching row number in A, J in B & K in C.
I,J,K = np.where((M1[:,:,None,:] & M2).all(3))

# Finally, select rows of A, B and C with I, J and K respectively
A_new = A[I]
B_new = B[J]
C_new = C[K]

樣品運行-

1)輸入:

In [116]: A
Out[116]: 
array([[ 83,  12, 443],
       [ 83,  13, 565],
       [ 83,  14, 342],
       [161,  23, 431],
       [162,   0, 113],
       [162,   1, 313]])

In [117]: B
Out[117]: 
array([[161,  23, 999],
       [  5,   1,  13],
       [ 83,  12,  15],
       [162,   0,  12],
       [  4,   3,  11]])

In [118]: C
Out[118]: 
array([[ 11,  23, 143],
       [162,   0, 113],
       [161,  23, 545]])

2)運行解決方案代碼以獲取匹配的行ID,從而提取行:

In [119]: M1 = (A[:,None,:2] == B[:,:2])
     ...: M2 = (B[:,None,:2] == C[:,:2])
     ...: 

In [120]: I,J,K = np.where((M1[:,:,None,:] & M2).all(3))

In [121]: A[I]
Out[121]: 
array([[161,  23, 431],
       [162,   0, 113]])

In [122]: B[J]
Out[122]: 
array([[161,  23, 999],
       [162,   0,  12]])

In [123]: C[K]
Out[123]: 
array([[161,  23, 545],
       [162,   0, 113]])

numpy_indexed軟件包(免責聲明:我是它的作者)包含以優雅,有效/矢量化的方式解決此類問題的功能:

import numpy as np
import numpy_indexed as npi

dyn = np.array(dyn)
us = np.array(us)

dyn_index = npi.as_index(dyn[:, :2])
us_index = npi.as_index(us[:, :2])

common = npi.intersection(dyn_index, us_index)
print(common)
print(dyn[npi.contains(common, dyn_index)])
print(us[npi.contains(common, us_index)])

注意,性能NlogN最壞的情況; 和as_index的參數已經按照排序順序排列。 相反,當前接受的答案在輸入大小上是二次方的。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM