Grouping and Summing by Multiple Columns of Dataframe in Pandas

Question

I have a dataframe that looks like:

YEAR |  REGION  |  POWER  |
2009 |   West   |  1.66   |
2009 |   West   |  1.77   |
2009 |   East   |  10.6   |
2009 |   East   |  8.7    |
2010 |   West   |  11.9   |
2010 |   North  |  14.8  |
2010 |   North  |  4.6    |
2010 |   West   |  3.0    |
2011 |   East   |  7.0    |
2011 |   East   |  9.66   |

I want to sum the numerical values for POWER grouped by both the YEAR and the REGION so that I get something like:

YEAR |  REGION  |  POWER  |
2009 |   West   |  3.43   |
2009 |   East   |  19.3   |
2010 |   West   |  11.9   |
2010 |   North  |  19.4   |
2010 |   West   |  3.0    |
2011 |   East   |  16.66  |

I've tried:

df.groupby(['YEAR', 'REGION'])['POWER'].sum()

But I get a series with the values in POWER side by side instead of summed.

Can anyone help do this operation?

Answer 1

Run the sum on the groupby , and then reset_index() to flatten it. Like so:

df.groupby(['YEAR', 'REGION']).sum().reset_index()

#    YEAR REGION  POWER
# 0  2009   East  19.30
# 1  2009   West   3.43
# 2  2010  North  19.40
# 3  2010   West  14.90
# 4  2011   East  16.66

Answer 2

Create a grouper column using shift and cumsum :

df['grp'] = df.groupby(['YEAR'])['REGION'].apply(lambda x: (x != x.shift(1).bfill()).cumsum())

df_out = df.groupby(['YEAR','REGION','grp'], sort=False).sum().reset_index()
df_out = df_out.drop('grp', axis=1)

Output:

   YEAR REGION  POWER
0  2009   West   3.43
1  2009   East  19.30
2  2010   West  11.90
3  2010  North  19.40
4  2010   West   3.00
5  2011   East  16.66

Details this what grouper column, grp looks like before aggregation. For each year check the region to the previous records' region, if different increment by 1. Then, cumsum in that year to create groups.

   YEAR REGION  POWER  grp
0  2009   West   1.66    0
1  2009   West   1.77    0
2  2009   East  10.60    1
3  2009   East   8.70    1
4  2010   West  11.90    0
5  2010  North  14.80    1
6  2010  North   4.60    1
7  2010   West   3.00    2
8  2011   East   7.00    0
9  2011   East   9.66    0

Grouping and Summing by Multiple Columns of Dataframe in Pandas

Question

2 answers

solution1
2 ACCPTED 2018-10-15 23:46:42

solution2
0 2018-10-16 01:37:23

Grouping and Summing by Multiple Columns of Dataframe in Pandas

Question

2 answers

solution1 2 ACCPTED 2018-10-15 23:46:42

solution2 0 2018-10-16 01:37:23

solution1
2 ACCPTED 2018-10-15 23:46:42

solution2
0 2018-10-16 01:37:23