简体   繁体   中英

pandas: groupby sum conditional on other column

i have a dataframe which looks like this

pd.DataFrame({'a':['A', 'B', 'B', 'C', 'C', 'D', 'D', 'E'],
              'b':['Y', 'Y', 'N', 'Y', 'Y', 'N', 'N', 'N'],
              'c':[20, 5, 12, 8, 15, 10, 25, 13]})

   a  b   c
0  A  Y  20
1  B  Y   5
2  B  N  12
3  C  Y   8
4  C  Y  15
5  D  N  10
6  D  N  25
7  E  N  13

i would like to groupby column 'a', check if any of column 'b' is 'Y' or True and keep that value and then just sum on 'c'

the resulting dataframe should look like this

   a  b   c
0  A  Y  20
1  B  Y  17
2  C  Y  23
3  D  N  35
4  E  N  13

i tried the below but get an error

df.groupby('a')['b'].max()['c'].sum()

You can use agg with max and sum . Max on column 'b' indeed works because 'Y' > 'N' == True

print(df.groupby('a', as_index=False).agg({'b': 'max', 'c': 'sum'}))

   a  b   c
0  A  Y  20
1  B  Y  17
2  C  Y  23
3  D  N  35
4  E  N  13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM