I have a dataframe that looks like:
YEAR | REGION | POWER |
2009 | West | 1.66 |
2009 | West | 1.77 |
2009 | East | 10.6 |
2009 | East | 8.7 |
2010 | West | 11.9 |
2010 | North | 14.8 |
2010 | North | 4.6 |
2010 | West | 3.0 |
2011 | East | 7.0 |
2011 | East | 9.66 |
I want to sum the numerical values for POWER grouped by both the YEAR and the REGION so that I get something like:
YEAR | REGION | POWER |
2009 | West | 3.43 |
2009 | East | 19.3 |
2010 | West | 11.9 |
2010 | North | 19.4 |
2010 | West | 3.0 |
2011 | East | 16.66 |
I've tried:
df.groupby(['YEAR', 'REGION'])['POWER'].sum()
But I get a series with the values in POWER side by side instead of summed.
Can anyone help do this operation?
Run the sum
on the groupby
, and then reset_index()
to flatten it. Like so:
df.groupby(['YEAR', 'REGION']).sum().reset_index()
# YEAR REGION POWER
# 0 2009 East 19.30
# 1 2009 West 3.43
# 2 2010 North 19.40
# 3 2010 West 14.90
# 4 2011 East 16.66
Create a grouper column using shift
and cumsum
:
df['grp'] = df.groupby(['YEAR'])['REGION'].apply(lambda x: (x != x.shift(1).bfill()).cumsum())
df_out = df.groupby(['YEAR','REGION','grp'], sort=False).sum().reset_index()
df_out = df_out.drop('grp', axis=1)
Output:
YEAR REGION POWER
0 2009 West 3.43
1 2009 East 19.30
2 2010 West 11.90
3 2010 North 19.40
4 2010 West 3.00
5 2011 East 16.66
Details this what grouper column, grp looks like before aggregation. For each year check the region to the previous records' region, if different increment by 1. Then, cumsum in that year to create groups.
YEAR REGION POWER grp
0 2009 West 1.66 0
1 2009 West 1.77 0
2 2009 East 10.60 1
3 2009 East 8.70 1
4 2010 West 11.90 0
5 2010 North 14.80 1
6 2010 North 4.60 1
7 2010 West 3.00 2
8 2011 East 7.00 0
9 2011 East 9.66 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.