简体   繁体   中英

How to get the whole row based on a max value from one column in pandas.groupby().max()?

I want to specify that I need to get the whole row for a max value, not different max values from multiple rows, in my example this should be based on the column 'Number'. Such as this way:

import pandas as pd

data = {
    'Number':[12,55,3,2,88,17],
    'People':['Zack','Zack','Merry','Merry','Cross','Cross'],
    'Random':[353,0.5454,0.5454336,32,-7,4]
}

df = pd.DataFrame (data, columns = ['Number','People','Random'])

print(df,'\n')

max_values = df.groupby('People').max()

print(max_values)

Here is the result:

   Number People      Random
0      12   Zack  353.000000
1      55   Zack    0.545400
2       3  Merry    0.545434
3       2  Merry   32.000000
4      88  Cross   -7.000000
5      17  Cross    4.000000 

        Number  Random
People                
Cross       88     4.0
Merry        3    32.0
Zack        55   353.0

Here is the expected result for max_values :

        Number  Random
People                
Cross       88    -7.000000
Merry        3    0.545434
Zack        55   353.0

You could do the following:

import pandas as pd

data = {
    'Number':[12,55,3,2,88,17],
    'People':['Zack','Zack','Merry','Merry','Cross','Cross'],
    'Random':[353,0.5454,0.5454336,32,-7,4]
}

df = pd.DataFrame (data, columns = ['Number','People','Random'])

print(df,'\n')

res = df[df.groupby(['People'])['Number'].transform(max) == df['Number']].set_index('People')
print(res)

Which gives the following output:

        Number    Random
People                  
Zack        55  0.545400
Merry        3  0.545434
Cross       88 -7.000000

The problem in your code was that max() is applied per column so by using slicing you can avoid this issue.

Note The expected output is a mistake in the question

You could try something like this -

df['max_number'] = df.groupby(['People'])['Number'].transform(max)
df[df.Number == df.max_number].drop('max_number', axis=1).set_index('People')

         Number Random
People                  
Zack        55  0.545400
Merry        3  0.545434
Cross       88 -7.000000

This is more straightforward way to do it IMHO.

df.sort_values('Number').groupby('People').tail(1)

(Maybe also change your column name to "Name")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM