[英]Remove rows from numpy array based on presence/absence in other arrays
我有3個不同的numpy數組,但它們都以兩列開頭,其中包含年份和時間。 例如:
dyn = [[ 83 12 7.10555687e-01 ..., 6.99242766e-01 6.868761e-01]
[ 83 13 8.28091972e-01 ..., 8.33734118e-01 8.47266838e-01]
[ 83 14 8.79437354e-01 ..., 8.73598144e-01 8.57156213e-01]
[ 161 23 3.28109488e-01 ..., 2.83043689e-01 2.59775391e-01]
[ 162 0 2.23502046e-01 ..., 1.96972086e-01 1.65565263e-01]
[ 162 1 2.51653976e-01 ..., 2.17209188e-01 1.42133495e-1]]
us = [[ 133 18 3.00483815e+02 ..., 1.94277561e+00 2.8168959e+00]
[ 133 19 2.98832620e+02 ..., 2.42506475e+00 2.99730800e+00]
[ 133 20 2.96706105e+02 ..., 3.16851622e+00 4.41187088e+00]
[ 161 23 2.88336560e+02 ..., 3.44864070e-01 3.85055635e-01]
[ 162 0 2.87593240e+02 ..., 2.93002410e-01 2.67112490e-01]
[ 162 2 2.86992180e+02 ..., 7.08996730e-02 2.6403210e-01]]
我需要能夠刪除所有3個數組中都沒有特定日期和時間的行。 換句話說,所以我剩下3個數組,在這3個數組中,前2列相同。
因此,得到的較小數組將是:
dyn= [[ 161 23 3.28109488e-01 ..., 2.83043689e-01 2.59775391e-01]
[ 162 0 2.23502046e-01 ..., 1.96972086e-01 1.65565263e-01]]
us= [[ 161 23 2.88336560e+02 ..., 3.44864070e-01 3.85055635e-01]
[ 162 0 2.87593240e+02 ..., 2.93002410e-01 2.67112490e-01]]
(但隨后也受第三個數組的限制)
我試過使用sort / zip,但不確定將其應用於2D數組,例如:
X= dyn
Y = us
xsorted=[x for (y,x) in sorted(zip(Y[:,1],X[:,1]), key=lambda pair: pair[0])]
還有一個循環,但是僅當陣列中相同的時間/日期位於相同的位置時才有效,這沒有幫助
for i in range(100):
dyn_small=dyn[dyn[:,0]==us[i,0]]
假設A
, B
和C
為輸入數組,這是一種矢量化方法,大量使用broadcasting
-
# Get masks comparing all rows of A with B and then B with C
M1 = (A[:,None,:2] == B[:,:2])
M2 = (B[:,None,:2] == C[:,:2])
# Get a joint 3D mask of those two masks and get the indices of matches.
# These indices (I,J,K) of the 3D mask basically tells us the row numbers
# correspondng to each of the input arrays that are present in all of them.
# Thus, in (I,J,K), I would be the matching row number in A, J in B & K in C.
I,J,K = np.where((M1[:,:,None,:] & M2).all(3))
# Finally, select rows of A, B and C with I, J and K respectively
A_new = A[I]
B_new = B[J]
C_new = C[K]
樣品運行-
1)輸入:
In [116]: A
Out[116]:
array([[ 83, 12, 443],
[ 83, 13, 565],
[ 83, 14, 342],
[161, 23, 431],
[162, 0, 113],
[162, 1, 313]])
In [117]: B
Out[117]:
array([[161, 23, 999],
[ 5, 1, 13],
[ 83, 12, 15],
[162, 0, 12],
[ 4, 3, 11]])
In [118]: C
Out[118]:
array([[ 11, 23, 143],
[162, 0, 113],
[161, 23, 545]])
2)運行解決方案代碼以獲取匹配的行ID,從而提取行:
In [119]: M1 = (A[:,None,:2] == B[:,:2])
...: M2 = (B[:,None,:2] == C[:,:2])
...:
In [120]: I,J,K = np.where((M1[:,:,None,:] & M2).all(3))
In [121]: A[I]
Out[121]:
array([[161, 23, 431],
[162, 0, 113]])
In [122]: B[J]
Out[122]:
array([[161, 23, 999],
[162, 0, 12]])
In [123]: C[K]
Out[123]:
array([[161, 23, 545],
[162, 0, 113]])
numpy_indexed軟件包(免責聲明:我是它的作者)包含以優雅,有效/矢量化的方式解決此類問題的功能:
import numpy as np
import numpy_indexed as npi
dyn = np.array(dyn)
us = np.array(us)
dyn_index = npi.as_index(dyn[:, :2])
us_index = npi.as_index(us[:, :2])
common = npi.intersection(dyn_index, us_index)
print(common)
print(dyn[npi.contains(common, dyn_index)])
print(us[npi.contains(common, us_index)])
注意,性能NlogN最壞的情況; 和as_index的參數已經按照排序順序排列。 相反,當前接受的答案在輸入大小上是二次方的。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.