[英]Parse pandas column names to create multi-indexed dataframe
我有一个看起来像这样的 DataFrame:
region 2008_indicatorA 2008_indicatorB ...(2009..2019)... 2020_indicatorA 2020_indicatorB
=============================================================================================
State1 ... ... ... ...
State2 ... ... ... ...
...
我需要从列中提取年份并制作单独的列year
,同时减少列数。 生成的 DF 应如下所示:
region year indicatorA indicatorB
========================================
State1 2008 ... ...
State1 2009 ... ...
...
State1 (..2020) ... ...
...
State2 2008 ... ...
...
使用DataFrame.set_index
并MultiIndex in columns
中split
MultiIndex,然后使用DataFrame.rename_axis
并通过DataFrame.stack
重塑:
print (df)
region 2008_indicatorA 2008_indicatorB 2020_indicatorA 2020_indicatorB
0 State1 1 3 5 8
1 State2 7 5 3 9
df1 = df.set_index('region')
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.rename_axis(['year',None], axis=1).stack(0).reset_index()
print (df1)
region year indicatorA indicatorB
0 State1 2008 1 3
1 State1 2020 5 8
2 State2 2008 7 5
3 State2 2020 3 9
对于MultiIndex DataFrame
删除DataFrame.reset_index
:
df1 = df.set_index('region')
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.rename_axis(['year',None], axis=1).stack(0)
print (df1)
indicatorA indicatorB
region year
State1 2008 1 3
2020 5 8
State2 2008 7 5
2020 3 9
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.