Sum column based on another column in Pandas DataFrame

Question

I have a pandas DataFrame like this:

>>> df = pd.DataFrame({'MONTREGL':[10,10,2222,35,200,56,5555],'SINID':['aaa','aaa','aaa','bbb','bbb','ccc','ccc'],'EXTRA':[400,400,400,500,500,333,333]})
>>> df
   MONTREGL SINID EXTRA
0        10   aaa   400
1        10   aaa   400
2      2222   aaa   400
3        35   bbb   500
4       200   bbb   500
5        56   ccc   333
6      5555   ccc   333

I want to sum the column MONTREGL for each groupby SINID ...

So I get 2242 for aaa and so on... ALSO I want to keep the value of column EXTRA .

This is the expected result:

   MONTREGL SINID EXTRA
0      2242   aaa   400
1       235   bbb   500
2      5611   ccc   333

Thanks for your help in advance!

Answer 1

I ended up using this script:

dff = df.groupby(["SINID","EXTRA"]).MONTREGL.sum().reset_index()

And it works in this test and production.

Answer 2

The code below works for your example:

df1 = df.groupby(["SINID"]).sum()
df1['EXTRA'] = df.groupby(["SINID"]).mean()['EXTRA']

Result :

       MONTREGL  EXTRA
SINID                 
aaa        2242  400.0
bbb         235  500.0
ccc        5611  333.0

Answer 3

my suggestion would be to filter you dataframe with conditions related to other columns then apply sum function,

it goes something like this.

import pandas as pd

df=pd.Dataframe({a:[1,2,3],b:[2001,2015,2019],c:[1,0,1]})

aux=df[df.c>0]

sa=aux.a.sum()

sb=aux.b.sum()

My syntax may not be correct ( i didnt run the code ) but it will probably work and lead you to your answer

Good luck.

Answer 4

I know this post is old, but this might be helpful for others:

Using loc: df.loc[df['SINID'] == aaa].MONTREGL.sum()

Using groupby: df.groupby('SINID')['MONTREGL'].sum()

A similar question is answered in the following link (check Alex Riley's response):

How do I sum values in a column that match a given condition using pandas?

Good luck,

Sum column based on another column in Pandas DataFrame

Question

4 answers

solution1
7 ACCPTED 2019-05-30 09:11:28

solution2
3 2019-05-29 13:26:21

solution3
0 2019-05-29 13:16:14

solution4
0 2020-10-30 00:10:42

Sum column based on another column in Pandas DataFrame

Question

4 answers

solution1 7 ACCPTED 2019-05-30 09:11:28

solution2 3 2019-05-29 13:26:21

solution3 0 2019-05-29 13:16:14

solution4 0 2020-10-30 00:10:42

solution1
7 ACCPTED 2019-05-30 09:11:28

solution2
3 2019-05-29 13:26:21

solution3
0 2019-05-29 13:16:14

solution4
0 2020-10-30 00:10:42