[英]filter rows of pandas dataframe based on groupby multiple conditions
我有一個 dataframe 為:
df = pd.DataFrame([[123,date(year=2021,month=12,day=1),1,date(year=2021,month=12,day=6)],[123,date(year=2021,month=12,day=2),3,date(year=2021,month=12,day=6)],[123,date(year=2021,month=12,day=3),5,date(year=2021,month=12,day=6)],[123,date(year=2021,month=12,day=4),0,date(year=2021,month=12,day=6)],[123,date(year=2021,month=12,day=5),0,date(year=2021,month=12,day=6)],[123,date(year=2021,month=12,day=6),0,date(year=2021,month=12,day=6)],[123,date(year=2021,month=12,day=7),0,date(year=2021,month=12,day=6)],[123,date(year=2021,month=12,day=8),0,date(year=2021,month=12,day=6)],[123,date(year=2021,month=12,day=9),0,date(year=2021,month=12,day=6)],[123,date(year=2021,month=12,day=10),0,date(year=2021,month=12,day=6)],[456,date(year=2021,month=12,day=1),1,date(year=2021,month=12,day=11)],[456,date(year=2021,month=12,day=2),3,date(year=2021,month=12,day=11)],[456,date(year=2021,month=12,day=3),5,date(year=2021,month=12,day=11)],[456,date(year=2021,month=12,day=4),0,date(year=2021,month=12,day=11)],[456,date(year=2021,month=12,day=5),0,date(year=2021,month=12,day=11)],[456,date(year=2021,month=12,day=6),2,date(year=2021,month=12,day=11)],[456,date(year=2021,month=12,day=7),3,date(year=2021,month=12,day=11)],[456,date(year=2021,month=12,day=8),0,date(year=2021,month=12,day=11)],[456,date(year=2021,month=12,day=9),0,date(year=2021,month=12,day=11)],[456,date(year=2021,month=12,day=10),0,date(year=2021,month=12,day=11)]], columns=['ID','date', 'value','Matdate'])
看起來像:
ID date value Matdate
0 123 2021-12-01 1 2021-12-06
1 123 2021-12-02 3 2021-12-06
2 123 2021-12-03 5 2021-12-06
3 123 2021-12-04 0 2021-12-06
4 123 2021-12-05 0 2021-12-06
5 123 2021-12-06 0 2021-12-06
6 123 2021-12-07 0 2021-12-06
7 123 2021-12-08 0 2021-12-06
8 123 2021-12-09 0 2021-12-06
9 123 2021-12-10 0 2021-12-06
10 456 2021-12-01 1 2021-12-11
11 456 2021-12-02 3 2021-12-11
12 456 2021-12-03 5 2021-12-11
13 456 2021-12-04 0 2021-12-11
14 456 2021-12-05 0 2021-12-11
15 456 2021-12-06 2 2021-12-11
16 456 2021-12-07 3 2021-12-11
17 456 2021-12-08 0 2021-12-11
18 456 2021-12-09 0 2021-12-11
19 456 2021-12-10 0 2021-12-11
所需的 dataframe 是:
ID date value Matdate
0 123 2021-12-01 1 2021-12-06
1 123 2021-12-02 3 2021-12-06
2 123 2021-12-03 5 2021-12-06
3 123 2021-12-04 0 2021-12-06
4 123 2021-12-05 0 2021-12-06
5 123 2021-12-06 0 2021-12-06
10 456 2021-12-01 1 2021-12-11
11 456 2021-12-02 3 2021-12-11
12 456 2021-12-03 5 2021-12-11
13 456 2021-12-04 0 2021-12-11
14 456 2021-12-05 0 2021-12-11
15 456 2021-12-06 2 2021-12-11
16 456 2021-12-07 3 2021-12-11
基本上我必須根據 ID 分組中的兩個條件刪除行。
如果 value 列在最后一個日期之前只包含 0 個值,並且那些日期大於 Matdate 的行。 如果 Matdate 小於等於 date 則不要丟棄它。 到目前為止,我已經嘗試過類似以下的方法。
df.drop( df[(df.iloc[::-1].groupby('ID')['value'].cumsum().iloc[::-1].ne(0) == False) & df.groupby('ID')['date'] > df.groupby('ID')['Matdate'] ].index )
您可以比較沒有組的值,因此您的解決方案將使用 remove groupby
進行更改:
df1 = df[(df.iloc[::-1].groupby('ID')['value'].cumsum().iloc[::-1].eq(0)) & (df['date'] > df['Matdate']) ]
df.drop(df1.index)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.