pandas: groupby sum conditional on other column

Question

i have a dataframe which looks like this

pd.DataFrame({'a':['A', 'B', 'B', 'C', 'C', 'D', 'D', 'E'],
              'b':['Y', 'Y', 'N', 'Y', 'Y', 'N', 'N', 'N'],
              'c':[20, 5, 12, 8, 15, 10, 25, 13]})

   a  b   c
0  A  Y  20
1  B  Y   5
2  B  N  12
3  C  Y   8
4  C  Y  15
5  D  N  10
6  D  N  25
7  E  N  13

i would like to groupby column 'a', check if any of column 'b' is 'Y' or True and keep that value and then just sum on 'c'

the resulting dataframe should look like this

   a  b   c
0  A  Y  20
1  B  Y  17
2  C  Y  23
3  D  N  35
4  E  N  13

i tried the below but get an error

df.groupby('a')['b'].max()['c'].sum()

Answer 1

You can use agg with max and sum . Max on column 'b' indeed works because 'Y' > 'N' == True

print(df.groupby('a', as_index=False).agg({'b': 'max', 'c': 'sum'}))

   a  b   c
0  A  Y  20
1  B  Y  17
2  C  Y  23
3  D  N  35
4  E  N  13

pandas: groupby sum conditional on other column

Question

1 answers

solution1
1 ACCPTED 2020-07-01 21:49:23

pandas: groupby sum conditional on other column

Question

1 answers

solution1 1 ACCPTED 2020-07-01 21:49:23

solution1
1 ACCPTED 2020-07-01 21:49:23