[英]Pandas pivot_table - How to make a MultiIndex from a mix of column values and column names?
I'm relatively new to Pandas.我对 Pandas 比较陌生。 I have a DataFrame in the form:我有一个 DataFrame 的形式:
A B C D E
0 1 1.1 a 23.7853 18.2647
1 1 1.2 a 23.7118 17.2387
2 1 1.1 b 24.1873 17.3874
3 1 1.2 b 23.1873 18.1748
4 2 1.1 a 24.1872 18.1847
... ... ... ... ... ...
I would like to pivot it to have a three-level MultiIndex constructed from the values in columns A and B and the column headers ["D", "E"].我希望 pivot 它具有由 A 和 B 列中的值以及列标题 [“D”、“E”] 构建的三级 MultiIndex。 I also want to use the values from B as the new column headers and the data in columns D and E for the values.我还想将 B 中的值用作新的列标题,并将 D 和 E 列中的数据用作值。 All values are one-to-one (with some NaNs).所有值都是一对一的(带有一些 NaN)。 If I understand correctly, I need to use pivot_table() instead of just pivot() because of the MultiIndex.如果我理解正确,由于 MultiIndex,我需要使用 pivot_table() 而不是 pivot()。 Ultimately I want a table that looks like:最终我想要一个看起来像这样的表:
B 1.1 1.2 ...
A C col-name
1 a D 23.7853 23.7118 ...
E 18.2647 17.2387 ...
b D 24.1873 23.1873 ...
E 17.3874 18.1748 ...
2 a D 24.1872 23.1987 ...
E 18.1847 19.2387 ...
... ... ... ... ... ...
I'm pretty sure the answer is to use some command like我很确定答案是使用一些命令,例如
pd.pivot_table(df, columns=["B"], values=["D","E"], index=["A","C","???"])
I'm unsure what to put in the "values" and "index" arguments to get the right behavior.我不确定在“值”和“索引”arguments 中放入什么以获得正确的行为。
If I can't do this with a single pivot_table command, do I need to construct my Multi-Index ahead of time?如果我不能用一个 pivot_table 命令做到这一点,我是否需要提前构建我的多索引? Then what?然后呢?
Thanks!谢谢!
Create a multiindex with columns A, C, B
then use stack
+ unstack
to reshape the dataframe使用列A, C, B
创建多索引,然后使用stack
+ unstack
重塑 dataframe
df.set_index(['A', 'C', 'B']).stack().unstack(-2)
B 1.1 1.2
A C
1 a D 23.7853 23.7118
E 18.2647 17.2387
b D 24.1873 23.1873
E 17.3874 18.1748
2 a D 24.1872 NaN
E 18.1847 NaN
You can use pd.pivot_table()
together with .stack()
, as follows:您可以将pd.pivot_table()
与.stack()
() 一起使用,如下所示:
(pd.pivot_table(df, index=['A', 'C'], columns='B', values=["D","E"])
.rename_axis(columns=['col_name', 'B']) # set axis name for ["D","E"]
.stack(level=0)
)
Result:结果:
B 1.1 1.2
A C col_name
1 a D 23.7853 23.7118
E 18.2647 17.2387
b D 24.1873 23.1873
E 17.3874 18.1748
2 a D 24.1872 NaN
E 18.1847 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.