简体   繁体   English

重置从groupby或支点创建的pandas DataFrame的索引?

[英]Reset the index for a pandas DataFrame created from a groupby or pivot?

I have data that contains prices, volumes and other data about various financial securities. 我的数据包含有关各种金融证券的价格,数量和其他数据。 My input data looks like the following: 我的输入数据如下所示:

import numpy as np
import pandas

prices = np.random.rand(15) * 100
volumes = np.random.randint(15, size=15) * 10
idx = pandas.Series([2007, 2007, 2007, 2007, 2007, 2008,
                     2008, 2008, 2008, 2008, 2009, 2009,
                     2009, 2009, 2009], name='year')
df = pandas.DataFrame.from_items([('price', prices), ('volume', volumes)])
df.index = idx

# BELOW IS AN EXMPLE OF WHAT INPUT MIGHT LOOK LIKE
# IT WON'T BE EXACT BECAUSE OF THE USE OF RANDOM
#           price  volume
# year
# 2007   0.121002      30
# 2007  15.256424      70
# 2007  44.479590      50
# 2007  29.096013       0
# 2007  21.424690       0
# 2008  23.019548      40
# 2008  90.011295       0
# 2008  88.487664      30
# 2008  51.609119      70
# 2008   4.265726      80
# 2009  34.402065     140
# 2009  10.259064     100
# 2009  47.024574     110
# 2009  57.614977     140
# 2009  54.718016      50

I want to produce a data frame that looks like: 我想产生一个看起来像这样的数据框:

year       2007       2008       2009
0      0.121002  23.019548  34.402065
1     15.256424  90.011295  10.259064
2     44.479590  88.487664  47.024574
3     29.096013  51.609119  57.614977
4     21.424690   4.265726  54.718016

I know of one way to produce the output above using groupby: 我知道一种使用groupby产生上述输出的方法:

df = df.reset_index()
grouper = df.groupby('year')
df2 = None
for group, data in grouper:
    series = data['price'].copy()
    series.index = range(len(series))
    series.name = group
    df2 = pandas.DataFrame(series) if df2 is None else pandas.concat([df2, series], axis=1)

And I also know that you can do pivot to get a DataFrame that has NaNs for the missing indices on the pivot: 而且我也知道,您可以进行数据透视以获取一个数据框,该数据框具有数据透视缺失的索引所对应的NaN:

# df = df.reset_index()
df.pivot(columns='year', values='price')

# Output
# year       2007       2008       2009
# 0      0.121002        NaN        NaN
# 1     15.256424        NaN        NaN
# 2     44.479590        NaN        NaN
# 3     29.096013        NaN        NaN
# 4     21.424690        NaN        NaN
# 5           NaN  23.019548        NaN
# 6           NaN  90.011295        NaN
# 7           NaN  88.487664        NaN
# 8           NaN  51.609119        NaN
# 9           NaN   4.265726        NaN
# 10          NaN        NaN  34.402065
# 11          NaN        NaN  10.259064
# 12          NaN        NaN  47.024574
# 13          NaN        NaN  57.614977
# 14          NaN        NaN  54.718016

My question is the following: 我的问题如下:

Is there a way that I can create my output DataFrame in the groupby without creating the series, or is there a way I can re-index my input DataFrame so that I get the desired output using pivot? 有没有一种方法可以在不创建序列的情况下在groupby中创建我的输出DataFrame,或者有一种我可以重新索引我的输入DataFrame以便使用数据透视表获得所需输出的方法?

You need to label each year 0-4. 您需要每年标记0-4。 To do this, use the cumcount after grouping. 为此,请在分组后使用cumcount Then you can pivot correctly using that new column as the index. 然后,您可以使用该新列作为索引正确地进行数据透视。

df['year_count'] = df.groupby(level='year').cumcount()
df.reset_index().pivot(index='year_count', columns='year', values='price')

year             2007       2008       2009
year_count                                 
0           61.682275  32.729113  54.859700
1           44.231296   4.453897  45.325802
2           65.850231  82.023960  28.325119
3           29.098607  86.046499  71.329594
4           67.864723  43.499762  19.255214

You can use groupby with apply new Series created with numpy array by values and then reshape by unstack : 您可以将groupby与通过values apply numpy array创建的新Series结合使用,然后通过unstack重塑unstack

print (df.groupby(level='year')['price'].apply(lambda x: pd.Series(x.values)).unstack(0))
year       2007       2008       2009
0     55.360804  68.671626  78.809139
1     50.246485  55.639250  84.483814
2     17.646684  14.386347  87.185550
3     54.824732  91.846018  60.793002
4     24.303751  50.908714  22.084445

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM