i have a dataframe which looks like this
pd.DataFrame({'a':['A', 'B', 'B', 'C', 'C', 'D', 'D', 'E'],
'b':['Y', 'Y', 'N', 'Y', 'Y', 'N', 'N', 'N'],
'c':[20, 5, 12, 8, 15, 10, 25, 13]})
a b c
0 A Y 20
1 B Y 5
2 B N 12
3 C Y 8
4 C Y 15
5 D N 10
6 D N 25
7 E N 13
i would like to groupby column 'a', check if any of column 'b' is 'Y' or True and keep that value and then just sum on 'c'
the resulting dataframe should look like this
a b c
0 A Y 20
1 B Y 17
2 C Y 23
3 D N 35
4 E N 13
i tried the below but get an error
df.groupby('a')['b'].max()['c'].sum()
You can use agg
with max
and sum
. Max on column 'b' indeed works because 'Y' > 'N' == True
print(df.groupby('a', as_index=False).agg({'b': 'max', 'c': 'sum'}))
a b c
0 A Y 20
1 B Y 17
2 C Y 23
3 D N 35
4 E N 13
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.