Python Pandas group by then filter on condition

Question

I am trying to first group by id then count for each id , how many rows have score > avg .

dataframe:

id col1  avg  score
a   1     3    3
a   0     4    3
a   1     3    5
b   1     2    4
b   1     4    5

want:

id score>avg total
a    1       3
b    2       2

my code:

df2 = df.groupby('id', as_index=False)[['score'] > ['avg']].agg({'score>avg': 'count', 'total': 'count'})

error i got:

KeyError: 'Column not found: False'

i am not sure what i should edit the [['score'] > ['avg']] portion to.

Answer 1

One thing you can do is first create a column of boolean values that indicate whether score is greater than average and then group by 'id' and sum and count that new column.

df['score_gt_avg'] = df.score > df.avg
df.groupby('id')['score_gt_avg'].agg([('score>avg', 'sum'),('total', 'count')])

    score>avg  total
id                  
a         1.0      3
b         2.0      2

Equivalently you can also do in one line:

df.score.gt(df.avg).groupby(df.id).agg([('score>avg', 'sum'),('total', 'count')])

Python Pandas group by then filter on condition

Question

1 answers

solution1
2 ACCPTED 2017-01-17 05:58:48

Python Pandas group by then filter on condition

Question

1 answers

solution1 2 ACCPTED 2017-01-17 05:58:48

solution1
2 ACCPTED 2017-01-17 05:58:48