简体   繁体   中英

Select max row per group in pandas dataframe

I have a dataframe with multiple attributes, some are repeating. I want to select the rows based on the max value in one column - but return the row having that value (not the max of every column). How??

Here's a sample:

df = pd.DataFrame({'Owner': ['Bob', 'Jane', 'Amy',
                            'Steve','Kelly'],
                   'Make': ['Ford', 'Ford', 'Jeep',
                           'Ford','Jeep'],
                   'Model': ['Bronco', 'Bronco', 'Wrangler',
                            'Model T','Wrangler'],
                   'Max Speed': [80, 150, 69, 45, 72],
                  'Customer Rating': [90, 50, 91, 75, 99]})

this gives us:

在此处输入图像描述

I want the row having the max(customer rating) for each Make/Model. Like this: 在此处输入图像描述

Note this is NOT the same as df.groupby(['Make','Model']).max()

--> How do I do this?

A variation of your answer using idxmax :

>>> df.loc[df.groupby(['Make', 'Model'])['Customer Rating'].idxmax()]
   Owner  Make     Model  Max Speed  Customer Rating
0    Bob  Ford    Bronco         80               90
3  Steve  Ford   Model T         45               75
4  Kelly  Jeep  Wrangler         72               99

Another solution without groupby :

>>> df.sort_values('Customer Rating') \
      .drop_duplicates(['Make', 'Model'], keep='last') \
      .sort_index()

   Owner  Make     Model  Max Speed  Customer Rating
0    Bob  Ford    Bronco         80               90
3  Steve  Ford   Model T         45               75
4  Kelly  Jeep  Wrangler         72               99

I found an answer. I'm leaving the question up in case anyone else didn't recognize it as well.

Check out this post: Select the max row per group - pandas performance issue

I couldn't tell from that post that it was in fact what I needed, but it is. I tried two of them successfully:

def using_rank(df):
mask = (df.groupby(['Make', 'Model'])['Customer Rating'].rank(method='first', ascending=False) == 1)
return df.loc[mask]
df2 = using_rank(df)
df2

returns:

在此处输入图像描述

this also worked fine:

def using_sort(df):
df = df.sort_values(by=['Customer Rating'], ascending=False, kind='mergesort')
return df.groupby(['Make', 'Model'], as_index=False).first()

Propagate maximum in each desired group and filter out those equal to Customer Rating

df[df['Customer Rating']==df.groupby(['Make','Model'])['Customer Rating'].transform('max')]



  Owner  Make     Model  Max Speed  Customer Rating
0    Bob  Ford    Bronco         80               90
3  Steve  Ford   Model T         45               75
4  Kelly  Jeep  Wrangler         72               99

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM