Pandas dataframe groupby 时每组取组最大值

Question

I have dataframe with many columns, 2 are categorical and the rest are numeric:我有 dataframe 有很多列，2 是分类的，rest 是数字的：

df = [type1 , type2 , type3 , val1, val2, val3
       a       b        q       1    2     3
       a       c        w       3    5     2
       b       c        t       2    9     0
       a       b        p       4    6     7
       a       c        m       2    1     8]

I want to apply a merge based on the operation groupby(["type1","type2"]) that will create take the max value from the grouped row:我想根据操作groupby(["type1","type2"])应用合并，它将创建从分组行中获取最大值：

df = [type1 , type2 ,type3, val1, val2, val3 
       a       b       q      2    6     7     
       a       c       w      4    5     8      
       b       c       t      2    9     0

Explanation: val3 of first row is 7 because this is the maximal value when type1 = a, type2 = b .解释：第一行的val3是 7，因为这是type1 = a, type2 = b时的最大值。

Similarly, val3 of second row is 8 because this is the maximal value when type1 = a, type2 = c .同样，第二行的val3为 8，因为这是type1 = a, type2 = c时的最大值。

Answer 1

If need aggregate all columns by max :如果需要按max聚合所有列：

df = df.groupby(["type1","type2"]).max()
print (df)
            type3  val1  val2  val3
type1 type2                        
a     b         q     4     6     7
      c         w     3     5     8
b     c         t     2     9     0

If need some columns aggregate different you can create dictionary of columns names with aggregate functions and then set another aggregate functuions for some columns, like for type3 is used first and for val1 is used last :如果需要某些列聚合不同，您可以使用聚合函数创建列名称字典，然后为某些列设置另一个聚合函数，例如first使用type3 ， last使用val1 ：

d = dict.fromkeys(df.columns.difference(['type1','type2']), 'max')
d['type3'] = 'first'
d['val1'] = 'last'

df = df.groupby(["type1","type2"], as_index=False, sort=False).agg(d)
print (df)
  type1 type2 type3  val1  val2  val3
0     a     b     q     4     6     7
1     a     c     w     2     5     8
2     b     c     t     2     9     0

Pandas dataframe groupby 时每组取组最大值

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-04 11:15:26

Pandas dataframe groupby 时每组取组最大值

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-04 11:15:26

解决方案1
0 已采纳 2020-08-04 11:15:26