[英]pandas groupby on row condition
我有一個樣本數據集:
import pandas as pd
d = {
'H#': ['12843','12843','12843','12843','20000','20000','20000','20000','20000'],
'measure':[1,1,1,3,3,3,3,2,2],
'D':[1,0,2,1,1,1,2,1,1],
'N':[2,3,1,4,5,0,0,0,2]
}
df = pd.DataFrame(d)
df = df.reindex_axis(['H#','measure', 'D','N'], axis=1)
看起來像:
H# measure D N
0 12843 1 1 2
1 12843 1 0 3
2 12843 1 2 1
3 12843 3 1 4
4 20000 3 1 5
5 20000 3 1 0
6 20000 3 2 0
7 20000 2 1 0
8 20000 2 1 2
我想對不按“ H#”和“ measure” 度量= 3的行應用groupby,以總結“ D”和“ N”。 所需的輸出:
H# measure D N
0 12843 1 3 6
3 12843 3 1 4
4 20000 3 1 5
5 20000 3 1 0
6 20000 3 2 0
7 20000 2 2 2
我的嘗試:
mask=df["measure"]!=3 #first to mask the rows for the groupby
#the following line has the wrong syntax, how can i apply groupby to the masked dataset?
df.loc[mask,]= df.loc[mask,].groupby(['H#','measure'],as_index=False)['D','N'].sum()
最后一行代碼的語法錯誤,如何將groupby應用於屏蔽的數據集?
IIUC:
In [90]: (df[df.measure!=3]
.groupby(['H#','measure'], as_index=False)
.sum()
.append(df.loc[df.measure==3]))
Out[90]:
H# measure D N
0 12843 1 3 6
1 20000 2 2 2
3 12843 3 1 4
4 20000 3 1 5
5 20000 3 1 0
6 20000 3 2 0
您可以使用分解您的df和組,然后串聯起來:
pd.concat([df.query('measure == 3'),
df.query('measure != 3')
.groupby(['H#','measure'],as_index=False)['D','N']
.agg('sum')])
輸出:
H# measure D N
3 12843 3 1 4
4 20000 3 1 5
5 20000 3 1 0
6 20000 3 2 0
0 12843 1 3 6
1 20000 2 2 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.