简体   繁体   中英

Pandas Groupby returning max value and other specific column

Supposed a dataframe

import pandas as pd

df = pd.DataFrame({
    'Model': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
    'Year': [2019, 2020, 2021, 2018, 2019, 2020, 2021],
    'Transmission': ['Manual', 'Automatic', 'Automatic', 'Manual', 'Automatic', 'Automatic', 'Manual'],
    'EngineSize': [1.4, 2.0, 1.4, 1.5, 2.0, 1.5, 1.5],
    'MPG': [55.4, 67.3, 58.9, 52.3, 64.2, 68.9, 83.1]
})
df

and I want to return the highest MPG per year plus the model. Looked like this

Year  MPG   Model
2018 52.3   D
2019 64.2   E
2020 68.9   F
2021 83.1   G

I'm thinking by using groupby but still stuck on how to show the Model column.

You could use groupby + idxmax to get the index of the max MPG of each year; then use loc to filter:

out = df.loc[df.groupby('Year')['MPG'].idxmax(), ['Year', 'MPG', 'Model']]

Output:

   Year   MPG Model
3  2018  52.3     D
4  2019  64.2     E
5  2020  68.9     F
6  2021  83.1     G

I like @enke's answer better. But you could use the groupby apply with pd.DataFrame.nlargest

df.groupby('Year').apply(pd.DataFrame.nlargest, n=1, columns=['MPG'])

       Model  Year Transmission  EngineSize   MPG
Year                                             
2018 3     D  2018       Manual         1.5  52.3
2019 4     E  2019    Automatic         2.0  64.2
2020 5     F  2020    Automatic         1.5  68.9
2021 6     G  2021       Manual         1.5  83.1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM