[英]Grouping Multiple columns and sum of count in pandas df
i have a table in pandas df 我在熊猫df有一张桌子
master_id pidx pidy flag count
xxx a b A 10
xxx a c A 20
xxx a d A 30
xxx b d A 40
xxx a c C 50
xxx a c C 60
xxx x y C 70
xxx x y C 80
i want to do a grouping on multiple columns and summing the count irrespective of flag. 我想对多列进行分组,并与计数无关地对计数求和。
ie 即
xxx a c A 20
xxx a c C 50
xxx a c C 60
final output should be 最终输出应为
xxx a c A 130
final table should be 决赛桌应该是
master_id pidx pidy flag count
xxx a b A 10
xxx a c A 130
xxx a d A 30
xxx b d A 40
xxx x y C 150
I think you need groupby
with agg
- column flag
is aggregate by first
and column count
by sum
: 我认为您需要使用
agg
groupby
列flag
是first
聚合的,列count
根据sum
:
df = df.groupby(['pidx','pidy']).agg({'flag':'first', 'count':'sum'}).reset_index()
print (df)
pidx pidy count flag
0 a b 10 A
1 a c 130 A
2 a d 30 A
3 b d 40 A
4 x y 150 C
because if use groupby
by pidx
, pidy
and flag
, output is different: 因为如果通过
pidx
, pidy
和flag
使用groupby
,则输出是不同的:
df = df.groupby(['pidx','pidy','flag'], as_index=False)['count'].sum()
print (df)
pidx pidy flag count
0 a b A 10
1 a c A 20
2 a c C 110
3 a d A 30
4 b d A 40
5 x y C 150
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.