I have dataframe with many columns, 2 are categorical and the rest are numeric:
df = [type1 , type2 , type3 , val1, val2, val3
a b q 1 2 3
a c w 3 5 2
b c t 2 9 0
a b p 4 6 7
a c m 2 1 8]
I want to apply a merge based on the operation groupby(["type1","type2"])
that will create take the max value from the grouped row:
df = [type1 , type2 ,type3, val1, val2, val3
a b q 2 6 7
a c w 4 5 8
b c t 2 9 0
Explanation: val3
of first row is 7 because this is the maximal value when type1 = a, type2 = b
.
Similarly, val3
of second row is 8 because this is the maximal value when type1 = a, type2 = c
.
If need aggregate all columns by max
:
df = df.groupby(["type1","type2"]).max()
print (df)
type3 val1 val2 val3
type1 type2
a b q 4 6 7
c w 3 5 8
b c t 2 9 0
If need some columns aggregate different you can create dictionary of columns names with aggregate functions and then set another aggregate functuions for some columns, like for type3
is used first
and for val1
is used last
:
d = dict.fromkeys(df.columns.difference(['type1','type2']), 'max')
d['type3'] = 'first'
d['val1'] = 'last'
df = df.groupby(["type1","type2"], as_index=False, sort=False).agg(d)
print (df)
type1 type2 type3 val1 val2 val3
0 a b q 4 6 7
1 a c w 2 5 8
2 b c t 2 9 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.