简体   繁体   中英

pandas getting highest frequency value for each group in another column

I have a Pandas dataframe like this:

    id color    size  test
0   0  blue  medium     1
1   1  blue   small     2
2   5  blue   small     4
3   2  blue     big     3
4   3   red   small     4
5   4   red   small     5

My desired output is this:

color size
blue  small
red   small

I've tried:

df = df[['id', 'color', 'size']]
df = df.groupby(['color'])['size'].value_counts()

and get this:

color  size  
blue   small     2
       big       1
       medium    1
red    small     2
Name: size, dtype: int64

but it turns into a series and the columns seem all messed up.

Basically, for each of the groups of 'color', I want the 'size' with the highest frequency. I'm really having a lot of trouble with this. Any suggestions? Thanks so much.

We can do sort_values the groupby with tail

s=df.groupby(['color','size']).size().sort_values().groupby(level=0).tail(1).reset_index()
  color   size  0
0  blue  small  2
1   red  small  2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM