简体   繁体   中英

Pivot table operations on pandas dataframe

I have the foll. dataframe in pandas:

df

DAY   YEAR    REGION   VALUE
  1   2000     A         12
  2   2000     A         10
  3   2000     A         13
  6   2000     A         15
  1   2001     A         3
  2   2001     A         40
  3   2001     A         83
  4   2001     A         95
  1   2000     B         124
  3   2000     B         102
  5   2000     B         131
  8   2000     B         150
  1   2001     B         30
  5   2001     B         4
  8   2001     B         8
  9   2001     B         12

I would like to create a new data frame such that each row contains a distinct combination of YEAR and REGION. It also contains a column which sums up the VALUE for that YEAR, REGION combination and another column which provides the maximum VALUE for the YEAR, REGION combination. The result should look like:

YEAR    REGION  SUM_VALUE   MAX_VALUE
2000    A       50          15
2001    A       221         95
2000    B       507         150
2001    B       54          30

Here is what I am doing:

new_df = pandas.DataFrame()

for yr in df.YEAR.unique():
    for reg in df.REGION.unique():
            new_df = new_df.append({'YEAR': yr}, ignore_index=True)
            new_df = new_df.append({'REGION: reg}, ignore_index=True)

However, this creates a new row each time, and is not very pythonic due to the xtra for loops. Is there a better way to proceed?

Please note that this is a toy dataframe, the actual dataframe has several VALUE columns. The proposed solution should scale, without having to manually specify the names of the VALUE columns.

groupby on 'YEAR' and 'REGION' and pass a list of funcs to call using agg :

In [9]:
df.groupby(['YEAR','REGION'])['VALUE'].agg(['sum','max']).reset_index()

Out[9]:
   YEAR REGION  sum  max
0  2000      A   50   15
1  2000      B  507  150
2  2001      A  221   95
3  2001      B   54   30

EDIT :

If you want to name the aggregated columns, pass a dict:

In [18]:
df.groupby(['YEAR','REGION'])['VALUE'].agg({'sum_VALUE':'sum','max_VALUE':'max'}).reset_index()

Out[18]:
   YEAR REGION  max_VALUE  sum_VALUE
0  2000      A         15         50
1  2000      B        150        507
2  2001      A         95        221
3  2001      B         30         54

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM