Suppose I have a dataframe df
looking like
school score student_id
0 1 100.0 965
1 2 64.0 1483
2 2 100.0 1055
3 2 68.0 1806
4 1 100.0 971
I want to find how many maximum scores in each group and get something like this:
school count_max
1 2
2 1
How can I do it?
Assuming count_max
means that the score
column equals 100
, you can do:
df.loc[df.score==100, 'max_score'] = True
df.max_score.fillna(False, inplace=True)
df.groupby('school')['max_score'].sum()
You can use df.groupby()
, Series.agg()
, Series.value_counts()
, Series.max()
as follows:
(df.groupby('school')['score']
.agg(lambda x: x.value_counts().loc[x.max()])
.to_frame(name='count_max')
.astype(int)
.reset_index()
)
Explanation:
Here the code x.value_counts().loc[x.max()])
take x (passed by .agg()
to the lambda function) as the portion of pandas Series of column score
under each group of school
. The code call value_counts()
to get the counts of all unique values. Its return type is a pandas Series which we can use .loc[]
to locate the entry of x.max()
which is the maximum value within the group. The return value will be the count of entries of the maximum value within that group.
Output:
school count_max
0 1 2
1 2 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.