I am trying to first group by id
then count for each id
, how many rows have score
> avg
.
dataframe:
id col1 avg score
a 1 3 3
a 0 4 3
a 1 3 5
b 1 2 4
b 1 4 5
want:
id score>avg total
a 1 3
b 2 2
my code:
df2 = df.groupby('id', as_index=False)[['score'] > ['avg']].agg({'score>avg': 'count', 'total': 'count'})
error i got:
KeyError: 'Column not found: False'
i am not sure what i should edit the [['score'] > ['avg']]
portion to.
One thing you can do is first create a column of boolean values that indicate whether score is greater than average and then group by 'id' and sum
and count
that new column.
df['score_gt_avg'] = df.score > df.avg
df.groupby('id')['score_gt_avg'].agg([('score>avg', 'sum'),('total', 'count')])
score>avg total
id
a 1.0 3
b 2.0 2
Equivalently you can also do in one line:
df.score.gt(df.avg).groupby(df.id).agg([('score>avg', 'sum'),('total', 'count')])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.