简体   繁体   中英

Python Pandas group by then filter on condition

I am trying to first group by id then count for each id , how many rows have score > avg .

dataframe:

id col1  avg  score
a   1     3    3
a   0     4    3
a   1     3    5
b   1     2    4
b   1     4    5

want:

id score>avg total
a    1       3
b    2       2

my code:

df2 = df.groupby('id', as_index=False)[['score'] > ['avg']].agg({'score>avg': 'count', 'total': 'count'})

error i got:

KeyError: 'Column not found: False'

i am not sure what i should edit the [['score'] > ['avg']] portion to.

One thing you can do is first create a column of boolean values that indicate whether score is greater than average and then group by 'id' and sum and count that new column.

df['score_gt_avg'] = df.score > df.avg
df.groupby('id')['score_gt_avg'].agg([('score>avg', 'sum'),('total', 'count')])

    score>avg  total
id                  
a         1.0      3
b         2.0      2

Equivalently you can also do in one line:

df.score.gt(df.avg).groupby(df.id).agg([('score>avg', 'sum'),('total', 'count')])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM