Pivot table operations on pandas dataframe

Question

I have the foll. dataframe in pandas:

df

DAY   YEAR    REGION   VALUE
  1   2000     A         12
  2   2000     A         10
  3   2000     A         13
  6   2000     A         15
  1   2001     A         3
  2   2001     A         40
  3   2001     A         83
  4   2001     A         95
  1   2000     B         124
  3   2000     B         102
  5   2000     B         131
  8   2000     B         150
  1   2001     B         30
  5   2001     B         4
  8   2001     B         8
  9   2001     B         12

I would like to create a new data frame such that each row contains a distinct combination of YEAR and REGION. It also contains a column which sums up the VALUE for that YEAR, REGION combination and another column which provides the maximum VALUE for the YEAR, REGION combination. The result should look like:

YEAR    REGION  SUM_VALUE   MAX_VALUE
2000    A       50          15
2001    A       221         95
2000    B       507         150
2001    B       54          30

Here is what I am doing:

new_df = pandas.DataFrame()

for yr in df.YEAR.unique():
    for reg in df.REGION.unique():
            new_df = new_df.append({'YEAR': yr}, ignore_index=True)
            new_df = new_df.append({'REGION: reg}, ignore_index=True)

However, this creates a new row each time, and is not very pythonic due to the xtra for loops. Is there a better way to proceed?

Please note that this is a toy dataframe, the actual dataframe has several VALUE columns. The proposed solution should scale, without having to manually specify the names of the VALUE columns.

Answer 1

groupby on 'YEAR' and 'REGION' and pass a list of funcs to call using agg :

In [9]:
df.groupby(['YEAR','REGION'])['VALUE'].agg(['sum','max']).reset_index()

Out[9]:
   YEAR REGION  sum  max
0  2000      A   50   15
1  2000      B  507  150
2  2001      A  221   95
3  2001      B   54   30

EDIT :

If you want to name the aggregated columns, pass a dict:

In [18]:
df.groupby(['YEAR','REGION'])['VALUE'].agg({'sum_VALUE':'sum','max_VALUE':'max'}).reset_index()

Out[18]:
   YEAR REGION  max_VALUE  sum_VALUE
0  2000      A         15         50
1  2000      B        150        507
2  2001      A         95        221
3  2001      B         30         54

Pivot table operations on pandas dataframe

Question

1 answers

solution1
2 ACCPTED 2016-01-05 18:28:54

Pivot table operations on pandas dataframe

Question

1 answers

solution1 2 ACCPTED 2016-01-05 18:28:54

solution1
2 ACCPTED 2016-01-05 18:28:54