Reset the index for a pandas DataFrame created from a groupby or pivot?

Question

I have data that contains prices, volumes and other data about various financial securities. My input data looks like the following:

import numpy as np
import pandas

prices = np.random.rand(15) * 100
volumes = np.random.randint(15, size=15) * 10
idx = pandas.Series([2007, 2007, 2007, 2007, 2007, 2008,
                     2008, 2008, 2008, 2008, 2009, 2009,
                     2009, 2009, 2009], name='year')
df = pandas.DataFrame.from_items([('price', prices), ('volume', volumes)])
df.index = idx

# BELOW IS AN EXMPLE OF WHAT INPUT MIGHT LOOK LIKE
# IT WON'T BE EXACT BECAUSE OF THE USE OF RANDOM
#           price  volume
# year
# 2007   0.121002      30
# 2007  15.256424      70
# 2007  44.479590      50
# 2007  29.096013       0
# 2007  21.424690       0
# 2008  23.019548      40
# 2008  90.011295       0
# 2008  88.487664      30
# 2008  51.609119      70
# 2008   4.265726      80
# 2009  34.402065     140
# 2009  10.259064     100
# 2009  47.024574     110
# 2009  57.614977     140
# 2009  54.718016      50

I want to produce a data frame that looks like:

year       2007       2008       2009
0      0.121002  23.019548  34.402065
1     15.256424  90.011295  10.259064
2     44.479590  88.487664  47.024574
3     29.096013  51.609119  57.614977
4     21.424690   4.265726  54.718016

I know of one way to produce the output above using groupby:

df = df.reset_index()
grouper = df.groupby('year')
df2 = None
for group, data in grouper:
    series = data['price'].copy()
    series.index = range(len(series))
    series.name = group
    df2 = pandas.DataFrame(series) if df2 is None else pandas.concat([df2, series], axis=1)

And I also know that you can do pivot to get a DataFrame that has NaNs for the missing indices on the pivot:

# df = df.reset_index()
df.pivot(columns='year', values='price')

# Output
# year       2007       2008       2009
# 0      0.121002        NaN        NaN
# 1     15.256424        NaN        NaN
# 2     44.479590        NaN        NaN
# 3     29.096013        NaN        NaN
# 4     21.424690        NaN        NaN
# 5           NaN  23.019548        NaN
# 6           NaN  90.011295        NaN
# 7           NaN  88.487664        NaN
# 8           NaN  51.609119        NaN
# 9           NaN   4.265726        NaN
# 10          NaN        NaN  34.402065
# 11          NaN        NaN  10.259064
# 12          NaN        NaN  47.024574
# 13          NaN        NaN  57.614977
# 14          NaN        NaN  54.718016

My question is the following:

Is there a way that I can create my output DataFrame in the groupby without creating the series, or is there a way I can re-index my input DataFrame so that I get the desired output using pivot?

Answer 1

You need to label each year 0-4. To do this, use the cumcount after grouping. Then you can pivot correctly using that new column as the index.

df['year_count'] = df.groupby(level='year').cumcount()
df.reset_index().pivot(index='year_count', columns='year', values='price')

year             2007       2008       2009
year_count                                 
0           61.682275  32.729113  54.859700
1           44.231296   4.453897  45.325802
2           65.850231  82.023960  28.325119
3           29.098607  86.046499  71.329594
4           67.864723  43.499762  19.255214

Answer 2

You can use groupby with apply new Series created with numpy array by values and then reshape by unstack :

print (df.groupby(level='year')['price'].apply(lambda x: pd.Series(x.values)).unstack(0))
year       2007       2008       2009
0     55.360804  68.671626  78.809139
1     50.246485  55.639250  84.483814
2     17.646684  14.386347  87.185550
3     54.824732  91.846018  60.793002
4     24.303751  50.908714  22.084445

Reset the index for a pandas DataFrame created from a groupby or pivot?

Question

2 answers

solution1
3 ACCPTED 2017-01-10 04:44:06

solution2
0 2017-01-10 08:25:12

Reset the index for a pandas DataFrame created from a groupby or pivot?

Question

2 answers

solution1 3 ACCPTED 2017-01-10 04:44:06

solution2 0 2017-01-10 08:25:12

solution1
3 ACCPTED 2017-01-10 04:44:06

solution2
0 2017-01-10 08:25:12