简体   繁体   中英

Pandas Dataframe flatten crosstab with multilevel index

I have an Excel file which looks like this:

+-------+-------+-------+-------+-------+-------+
|       | Cat1  | Cat1  | Cat1  | Cat1  | Cat1  |
+-------+-------+-------+-------+-------+-------+
|       | Type1 | Type1 | Type1 | Type1 | Type2 |
+-------+-------+-------+-------+-------+-------+
|       | 2018  | 2018  | 2018  | 2018  | 2018  |
+-------+-------+-------+-------+-------+-------+
| Name  | 1Q    | 2Q    | 3Q    | 4Q    | 1Q    |
+-------+-------+-------+-------+-------+-------+
| Name1 | 1     | 5     | 3     | 5     | 4     |
+-------+-------+-------+-------+-------+-------+
| Name2 | 3     | 23    | 4     | 2     | 4     |
+-------+-------+-------+-------+-------+-------+
| Name3 | 4     | 3     | 5     | 3     | 44    |
+-------+-------+-------+-------+-------+-------+
| Name4 | 3     | 6     | 5     | 4     | 2     |
+-------+-------+-------+-------+-------+-------+

...and so on

I want to format it so that it looks like this:

+-------+------+-------+------+---------+-------+
| Name  | Cat  | Type  | Year | Quarter | Value |
+-------+------+-------+------+---------+-------+
| Name1 | Cat1 | Type1 | 2018 | 1Q      | 5     |
+-------+------+-------+------+---------+-------+
| Name1 | Cat1 | Type1 | 2018 | 2Q      | 3     |
+-------+------+-------+------+---------+-------+
| Name1 | Cat1 | Type1 | 2018 | 3Q      | 5     |
+-------+------+-------+------+---------+-------+
| Name1 | Cat1 | Type1 | 2018 | 4Q      | 4     |
+-------+------+-------+------+---------+-------+
| Name1 | Cat1 | Type2 | 2018 | 1Q      | 6     |
+-------+------+-------+------+---------+-------+

I've loaded it into a pandas DataFrame and am unsure how to proceed now. Is it melt, stack, unstack, MultiIndex...?

Use stack :

print (df.columns)
MultiIndex(levels=[['Cat1'], ['Type1', 'Type2'], ['2018'], ['1Q', '2Q', '3Q', '4Q']],
           labels=[[0, 0, 0, 0, 0], [0, 0, 0, 0, 1], [0, 0, 0, 0, 0], [0, 1, 2, 3, 0]])


df = df.stack([0,1,2,3]).reset_index()
df.columns = ['Name','Cat','Type','Year','Quarter','Value']
print (df)
     Name   Cat   Type  Year Quarter  Value
0   Name1  Cat1  Type1  2018      1Q    1.0
1   Name1  Cat1  Type1  2018      2Q    5.0
2   Name1  Cat1  Type1  2018      3Q    3.0
3   Name1  Cat1  Type1  2018      4Q    5.0
4   Name1  Cat1  Type2  2018      1Q    4.0
5   Name2  Cat1  Type1  2018      1Q    3.0
6   Name2  Cat1  Type1  2018      2Q   23.0
7   Name2  Cat1  Type1  2018      3Q    4.0
8   Name2  Cat1  Type1  2018      4Q    2.0
9   Name2  Cat1  Type2  2018      1Q    4.0
10  Name3  Cat1  Type1  2018      1Q    4.0
11  Name3  Cat1  Type1  2018      2Q    3.0
12  Name3  Cat1  Type1  2018      3Q    5.0
13  Name3  Cat1  Type1  2018      4Q    3.0
14  Name3  Cat1  Type2  2018      1Q   44.0
15  Name4  Cat1  Type1  2018      1Q    3.0
16  Name4  Cat1  Type1  2018      2Q    6.0
17  Name4  Cat1  Type1  2018      3Q    5.0
18  Name4  Cat1  Type1  2018      4Q    4.0
19  Name4  Cat1  Type2  2018      1Q    2.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM