简体   繁体   中英

Count maximum value in each group with Pandas

Suppose I have a dataframe df looking like

    school   score   student_id
0    1        100.0   965
1    2        64.0    1483
2    2        100.0   1055
3    2        68.0    1806
4    1        100.0    971

I want to find how many maximum scores in each group and get something like this:

school count_max
  1       2
  2       1

How can I do it?

Assuming count_max means that the score column equals 100 , you can do:

df.loc[df.score==100, 'max_score'] = True
df.max_score.fillna(False, inplace=True)
df.groupby('school')['max_score'].sum()

You can use df.groupby() , Series.agg() , Series.value_counts() , Series.max() as follows:

(df.groupby('school')['score']
   .agg(lambda x: x.value_counts().loc[x.max()])
   .to_frame(name='count_max')
   .astype(int)
   .reset_index()
)

Explanation:

Here the code x.value_counts().loc[x.max()]) take x (passed by .agg() to the lambda function) as the portion of pandas Series of column score under each group of school . The code call value_counts() to get the counts of all unique values. Its return type is a pandas Series which we can use .loc[] to locate the entry of x.max() which is the maximum value within the group. The return value will be the count of entries of the maximum value within that group.

Output:

   school  count_max
0       1          2
1       2          1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM