简体   繁体   中英

Pandas dataframe take group max value per group when groupby

I have dataframe with many columns, 2 are categorical and the rest are numeric:

df = [type1 , type2 , type3 , val1, val2, val3
       a       b        q       1    2     3
       a       c        w       3    5     2
       b       c        t       2    9     0
       a       b        p       4    6     7
       a       c        m       2    1     8]

I want to apply a merge based on the operation groupby(["type1","type2"]) that will create take the max value from the grouped row:

df = [type1 , type2 ,type3, val1, val2, val3 
       a       b       q      2    6     7     
       a       c       w      4    5     8      
       b       c       t      2    9     0      

Explanation: val3 of first row is 7 because this is the maximal value when type1 = a, type2 = b .

Similarly, val3 of second row is 8 because this is the maximal value when type1 = a, type2 = c .

If need aggregate all columns by max :

df = df.groupby(["type1","type2"]).max()
print (df)
            type3  val1  val2  val3
type1 type2                        
a     b         q     4     6     7
      c         w     3     5     8
b     c         t     2     9     0

If need some columns aggregate different you can create dictionary of columns names with aggregate functions and then set another aggregate functuions for some columns, like for type3 is used first and for val1 is used last :

d = dict.fromkeys(df.columns.difference(['type1','type2']), 'max')
d['type3'] = 'first'
d['val1'] = 'last'

df = df.groupby(["type1","type2"], as_index=False, sort=False).agg(d)
print (df)
  type1 type2 type3  val1  val2  val3
0     a     b     q     4     6     7
1     a     c     w     2     5     8
2     b     c     t     2     9     0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM