[英]Pandas dataframe take group max value per group when groupby
I have dataframe with many columns, 2 are categorical and the rest are numeric:我有 dataframe 有很多列,2 是分类的,rest 是数字的:
df = [type1 , type2 , type3 , val1, val2, val3
a b q 1 2 3
a c w 3 5 2
b c t 2 9 0
a b p 4 6 7
a c m 2 1 8]
I want to apply a merge based on the operation groupby(["type1","type2"])
that will create take the max value from the grouped row:我想根据操作
groupby(["type1","type2"])
应用合并,它将创建从分组行中获取最大值:
df = [type1 , type2 ,type3, val1, val2, val3
a b q 2 6 7
a c w 4 5 8
b c t 2 9 0
Explanation: val3
of first row is 7 because this is the maximal value when type1 = a, type2 = b
.解释:第一行的
val3
是 7,因为这是type1 = a, type2 = b
时的最大值。
Similarly, val3
of second row is 8 because this is the maximal value when type1 = a, type2 = c
.同样,第二行的
val3
为 8,因为这是type1 = a, type2 = c
时的最大值。
If need aggregate all columns by max
:如果需要按
max
聚合所有列:
df = df.groupby(["type1","type2"]).max()
print (df)
type3 val1 val2 val3
type1 type2
a b q 4 6 7
c w 3 5 8
b c t 2 9 0
If need some columns aggregate different you can create dictionary of columns names with aggregate functions and then set another aggregate functuions for some columns, like for type3
is used first
and for val1
is used last
:如果需要某些列聚合不同,您可以使用聚合函数创建列名称字典,然后为某些列设置另一个聚合函数,例如
first
使用type3
, last
使用val1
:
d = dict.fromkeys(df.columns.difference(['type1','type2']), 'max')
d['type3'] = 'first'
d['val1'] = 'last'
df = df.groupby(["type1","type2"], as_index=False, sort=False).agg(d)
print (df)
type1 type2 type3 val1 val2 val3
0 a b q 4 6 7
1 a c w 2 5 8
2 b c t 2 9 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.