简体   繁体   English

具有多级索引的Pandas Dataframe展平交叉表

[英]Pandas Dataframe flatten crosstab with multilevel index

I have an Excel file which looks like this: 我有一个如下所示的Excel文件:

+-------+-------+-------+-------+-------+-------+
|       | Cat1  | Cat1  | Cat1  | Cat1  | Cat1  |
+-------+-------+-------+-------+-------+-------+
|       | Type1 | Type1 | Type1 | Type1 | Type2 |
+-------+-------+-------+-------+-------+-------+
|       | 2018  | 2018  | 2018  | 2018  | 2018  |
+-------+-------+-------+-------+-------+-------+
| Name  | 1Q    | 2Q    | 3Q    | 4Q    | 1Q    |
+-------+-------+-------+-------+-------+-------+
| Name1 | 1     | 5     | 3     | 5     | 4     |
+-------+-------+-------+-------+-------+-------+
| Name2 | 3     | 23    | 4     | 2     | 4     |
+-------+-------+-------+-------+-------+-------+
| Name3 | 4     | 3     | 5     | 3     | 44    |
+-------+-------+-------+-------+-------+-------+
| Name4 | 3     | 6     | 5     | 4     | 2     |
+-------+-------+-------+-------+-------+-------+

...and so on ...等等

I want to format it so that it looks like this: 我想格式化它,使其看起来像这样:

+-------+------+-------+------+---------+-------+
| Name  | Cat  | Type  | Year | Quarter | Value |
+-------+------+-------+------+---------+-------+
| Name1 | Cat1 | Type1 | 2018 | 1Q      | 5     |
+-------+------+-------+------+---------+-------+
| Name1 | Cat1 | Type1 | 2018 | 2Q      | 3     |
+-------+------+-------+------+---------+-------+
| Name1 | Cat1 | Type1 | 2018 | 3Q      | 5     |
+-------+------+-------+------+---------+-------+
| Name1 | Cat1 | Type1 | 2018 | 4Q      | 4     |
+-------+------+-------+------+---------+-------+
| Name1 | Cat1 | Type2 | 2018 | 1Q      | 6     |
+-------+------+-------+------+---------+-------+

I've loaded it into a pandas DataFrame and am unsure how to proceed now. 我已将其加载到pandas DataFrame中,不确定现在如何进行。 Is it melt, stack, unstack, MultiIndex...? 它是否融化,堆叠,拆栈,MultiIndex ...?

Use stack : 使用stack

print (df.columns)
MultiIndex(levels=[['Cat1'], ['Type1', 'Type2'], ['2018'], ['1Q', '2Q', '3Q', '4Q']],
           labels=[[0, 0, 0, 0, 0], [0, 0, 0, 0, 1], [0, 0, 0, 0, 0], [0, 1, 2, 3, 0]])


df = df.stack([0,1,2,3]).reset_index()
df.columns = ['Name','Cat','Type','Year','Quarter','Value']
print (df)
     Name   Cat   Type  Year Quarter  Value
0   Name1  Cat1  Type1  2018      1Q    1.0
1   Name1  Cat1  Type1  2018      2Q    5.0
2   Name1  Cat1  Type1  2018      3Q    3.0
3   Name1  Cat1  Type1  2018      4Q    5.0
4   Name1  Cat1  Type2  2018      1Q    4.0
5   Name2  Cat1  Type1  2018      1Q    3.0
6   Name2  Cat1  Type1  2018      2Q   23.0
7   Name2  Cat1  Type1  2018      3Q    4.0
8   Name2  Cat1  Type1  2018      4Q    2.0
9   Name2  Cat1  Type2  2018      1Q    4.0
10  Name3  Cat1  Type1  2018      1Q    4.0
11  Name3  Cat1  Type1  2018      2Q    3.0
12  Name3  Cat1  Type1  2018      3Q    5.0
13  Name3  Cat1  Type1  2018      4Q    3.0
14  Name3  Cat1  Type2  2018      1Q   44.0
15  Name4  Cat1  Type1  2018      1Q    3.0
16  Name4  Cat1  Type1  2018      2Q    6.0
17  Name4  Cat1  Type1  2018      3Q    5.0
18  Name4  Cat1  Type1  2018      4Q    4.0
19  Name4  Cat1  Type2  2018      1Q    2.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM