[英]Reset the index for a pandas DataFrame created from a groupby or pivot?
我的數據包含有關各種金融證券的價格,數量和其他數據。 我的輸入數據如下所示:
import numpy as np
import pandas
prices = np.random.rand(15) * 100
volumes = np.random.randint(15, size=15) * 10
idx = pandas.Series([2007, 2007, 2007, 2007, 2007, 2008,
2008, 2008, 2008, 2008, 2009, 2009,
2009, 2009, 2009], name='year')
df = pandas.DataFrame.from_items([('price', prices), ('volume', volumes)])
df.index = idx
# BELOW IS AN EXMPLE OF WHAT INPUT MIGHT LOOK LIKE
# IT WON'T BE EXACT BECAUSE OF THE USE OF RANDOM
# price volume
# year
# 2007 0.121002 30
# 2007 15.256424 70
# 2007 44.479590 50
# 2007 29.096013 0
# 2007 21.424690 0
# 2008 23.019548 40
# 2008 90.011295 0
# 2008 88.487664 30
# 2008 51.609119 70
# 2008 4.265726 80
# 2009 34.402065 140
# 2009 10.259064 100
# 2009 47.024574 110
# 2009 57.614977 140
# 2009 54.718016 50
我想產生一個看起來像這樣的數據框:
year 2007 2008 2009
0 0.121002 23.019548 34.402065
1 15.256424 90.011295 10.259064
2 44.479590 88.487664 47.024574
3 29.096013 51.609119 57.614977
4 21.424690 4.265726 54.718016
我知道一種使用groupby產生上述輸出的方法:
df = df.reset_index()
grouper = df.groupby('year')
df2 = None
for group, data in grouper:
series = data['price'].copy()
series.index = range(len(series))
series.name = group
df2 = pandas.DataFrame(series) if df2 is None else pandas.concat([df2, series], axis=1)
而且我也知道,您可以進行數據透視以獲取一個數據框,該數據框具有數據透視缺失的索引所對應的NaN:
# df = df.reset_index()
df.pivot(columns='year', values='price')
# Output
# year 2007 2008 2009
# 0 0.121002 NaN NaN
# 1 15.256424 NaN NaN
# 2 44.479590 NaN NaN
# 3 29.096013 NaN NaN
# 4 21.424690 NaN NaN
# 5 NaN 23.019548 NaN
# 6 NaN 90.011295 NaN
# 7 NaN 88.487664 NaN
# 8 NaN 51.609119 NaN
# 9 NaN 4.265726 NaN
# 10 NaN NaN 34.402065
# 11 NaN NaN 10.259064
# 12 NaN NaN 47.024574
# 13 NaN NaN 57.614977
# 14 NaN NaN 54.718016
我的問題如下:
有沒有一種方法可以在不創建序列的情況下在groupby中創建我的輸出DataFrame,或者有一種我可以重新索引我的輸入DataFrame以便使用數據透視表獲得所需輸出的方法?
您需要每年標記0-4。 為此,請在分組后使用cumcount
。 然后,您可以使用該新列作為索引正確地進行數據透視。
df['year_count'] = df.groupby(level='year').cumcount()
df.reset_index().pivot(index='year_count', columns='year', values='price')
year 2007 2008 2009
year_count
0 61.682275 32.729113 54.859700
1 44.231296 4.453897 45.325802
2 65.850231 82.023960 28.325119
3 29.098607 86.046499 71.329594
4 67.864723 43.499762 19.255214
您可以將groupby
與通過values
apply
numpy array
創建的新Series
結合使用,然后通過unstack
重塑unstack
:
print (df.groupby(level='year')['price'].apply(lambda x: pd.Series(x.values)).unstack(0))
year 2007 2008 2009
0 55.360804 68.671626 78.809139
1 50.246485 55.639250 84.483814
2 17.646684 14.386347 87.185550
3 54.824732 91.846018 60.793002
4 24.303751 50.908714 22.084445
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.