pandas getting highest frequency value for each group in another column

Question

I have a Pandas dataframe like this:

    id color    size  test
0   0  blue  medium     1
1   1  blue   small     2
2   5  blue   small     4
3   2  blue     big     3
4   3   red   small     4
5   4   red   small     5

My desired output is this:

color size
blue  small
red   small

I've tried:

df = df[['id', 'color', 'size']]
df = df.groupby(['color'])['size'].value_counts()

and get this:

color  size  
blue   small     2
       big       1
       medium    1
red    small     2
Name: size, dtype: int64

but it turns into a series and the columns seem all messed up.

Basically, for each of the groups of 'color', I want the 'size' with the highest frequency. I'm really having a lot of trouble with this. Any suggestions? Thanks so much.

Answer 1

We can do sort_values the groupby with tail

s=df.groupby(['color','size']).size().sort_values().groupby(level=0).tail(1).reset_index()
  color   size  0
0  blue  small  2
1   red  small  2

pandas getting highest frequency value for each group in another column

Question

1 answers

solution1
3 ACCPTED 2020-05-22 02:00:36

pandas getting highest frequency value for each group in another column

Question

1 answers

solution1 3 ACCPTED 2020-05-22 02:00:36

solution1
3 ACCPTED 2020-05-22 02:00:36