简体   繁体   English

分组多列和pandas df中的计数总和

[英]Grouping Multiple columns and sum of count in pandas df

i have a table in pandas df 我在熊猫df有一张桌子

 master_id    pidx   pidy   flag   count
    xxx        a      b       A      10
    xxx        a      c       A      20
    xxx        a      d       A      30
    xxx        b      d       A      40
    xxx        a      c       C      50
    xxx        a      c       C      60
    xxx        x      y       C      70
    xxx        x      y       C      80

i want to do a grouping on multiple columns and summing the count irrespective of flag. 我想对多列进行分组,并与计数无关地对计数求和。

ie

 xxx  a    c   A   20
 xxx  a    c   C   50
 xxx  a    c   C   60

final output should be 最终输出应为

 xxx  a   c   A  130

final table should be 决赛桌应该是

 master_id   pidx   pidy   flag   count
    xxx        a      b       A      10
    xxx        a      c       A      130
    xxx        a      d       A      30
    xxx        b      d       A      40
    xxx        x      y       C      150

I think you need groupby with agg - column flag is aggregate by first and column count by sum : 我认为您需要使用agg groupbyflagfirst聚合的,列count根据sum

df = df.groupby(['pidx','pidy']).agg({'flag':'first', 'count':'sum'}).reset_index()
print (df)
  pidx pidy  count flag
0    a    b     10    A
1    a    c    130    A
2    a    d     30    A
3    b    d     40    A
4    x    y    150    C

because if use groupby by pidx , pidy and flag , output is different: 因为如果通过pidxpidyflag使用groupby ,则输出是不同的:

df = df.groupby(['pidx','pidy','flag'], as_index=False)['count'].sum()
print (df)
  pidx pidy flag  count
0    a    b    A     10
1    a    c    A     20
2    a    c    C    110
3    a    d    A     30
4    b    d    A     40
5    x    y    C    150

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM