Select max row per group in pandas dataframe

Question

I have a dataframe with multiple attributes, some are repeating. I want to select the rows based on the max value in one column - but return the row having that value (not the max of every column). How??

Here's a sample:

df = pd.DataFrame({'Owner': ['Bob', 'Jane', 'Amy',
                            'Steve','Kelly'],
                   'Make': ['Ford', 'Ford', 'Jeep',
                           'Ford','Jeep'],
                   'Model': ['Bronco', 'Bronco', 'Wrangler',
                            'Model T','Wrangler'],
                   'Max Speed': [80, 150, 69, 45, 72],
                  'Customer Rating': [90, 50, 91, 75, 99]})

this gives us:

I want the row having the max(customer rating) for each Make/Model. Like this:

Note this is NOT the same as df.groupby(['Make','Model']).max()

--> How do I do this?

Answer 1

A variation of your answer using idxmax :

>>> df.loc[df.groupby(['Make', 'Model'])['Customer Rating'].idxmax()]
   Owner  Make     Model  Max Speed  Customer Rating
0    Bob  Ford    Bronco         80               90
3  Steve  Ford   Model T         45               75
4  Kelly  Jeep  Wrangler         72               99

Another solution without groupby :

>>> df.sort_values('Customer Rating') \
      .drop_duplicates(['Make', 'Model'], keep='last') \
      .sort_index()

   Owner  Make     Model  Max Speed  Customer Rating
0    Bob  Ford    Bronco         80               90
3  Steve  Ford   Model T         45               75
4  Kelly  Jeep  Wrangler         72               99

Answer 2

I found an answer. I'm leaving the question up in case anyone else didn't recognize it as well.

Check out this post: Select the max row per group - pandas performance issue

I couldn't tell from that post that it was in fact what I needed, but it is. I tried two of them successfully:

def using_rank(df):
mask = (df.groupby(['Make', 'Model'])['Customer Rating'].rank(method='first', ascending=False) == 1)
return df.loc[mask]
df2 = using_rank(df)
df2

returns:

this also worked fine:

def using_sort(df):
df = df.sort_values(by=['Customer Rating'], ascending=False, kind='mergesort')
return df.groupby(['Make', 'Model'], as_index=False).first()

Answer 3

Propagate maximum in each desired group and filter out those equal to Customer Rating

df[df['Customer Rating']==df.groupby(['Make','Model'])['Customer Rating'].transform('max')]



  Owner  Make     Model  Max Speed  Customer Rating
0    Bob  Ford    Bronco         80               90
3  Steve  Ford   Model T         45               75
4  Kelly  Jeep  Wrangler         72               99

Select max row per group in pandas dataframe

Question

2 answers

solution1
3 2021-11-21 22:03:16

solution2
1 2021-11-21 21:58:12

solution3
0 2021-11-21 22:04:52

Select max row per group in pandas dataframe

Question

2 answers

solution1 3 2021-11-21 22:03:16

solution2 1 2021-11-21 21:58:12

solution3 0 2021-11-21 22:04:52

solution1
3 2021-11-21 22:03:16

solution2
1 2021-11-21 21:58:12

solution3
0 2021-11-21 22:04:52