简体   繁体   中英

Pandas table - converting multi-index to single index

I'm trying to restructure a table, and now have it in roughly the required format.

I have ended up with a multi-indexed table, I'd like to convert this to single and have the following columns

|    |   'SourceID' |  'P1_VAL_1' |   'P2_VAL_1' |   'P3_VAL_1' |   'P4_VAL_1' |   'P1_VAL_2' |   'P2_VAL_2' |   'P3_VAL_2' |   'P4_VAL_2' |

I can't work out how to combine the index levels?

Example:
|    |   ('SourceID', '') |   ('VAL_1', 'P1_VAL_1') |   ('VAL_1', 'P2_VAL_1') |   ('VAL_1', 'P3_VAL_1') |   ('VAL_1', 'P4_VAL_1') |   ('VAL_2', 'P1_VAL_2') |   ('VAL_2', 'P2_VAL_2') |   ('VAL_2', 'P3_VAL_2') |   ('VAL_2', 'P4_VAL_2') |
|---:|-------------------:|------------------------:|------------------------:|------------------------:|------------------------:|------------------------:|------------------------:|------------------------:|------------------------:|
|  0 |                  1 |                     100 |                     200 |                     300 |                     400 |                    1000 |                    2000 |                    3000 |                    4000 |
|  1 |                  2 |                     500 |                     600 |                     700 |                     nan |                    5000 |                    6000 |                    7000 |                     nan |
|      | SourceID | FieldName | VAL_1 | VAL_2 |
| ---- | -------- | --------- | ----- | ----- |
| 0    | 1        | P1        | 100   | 1000  |
| 1    | 1        | P2        | 200   | 2000  |
| 2    | 1        | P3        | 300   | 3000  |
| 3    | 1        | P4        | 400   | 4000  |
| 4    | 2        | P1        | 500   | 5000  |
| 5    | 2        | P2        | 600   | 6000  |
| 6    | 2        | P3        | 700   | 7000  |
Current Output
 | | ('SourceID', '') | ('VAL_1', 'P1_VAL_1') | ('VAL_1', 'P2_VAL_1') | ('VAL_1', 'P3_VAL_1') | ('VAL_1', 'P4_VAL_1') | ('VAL_2', 'P1_VAL_2') | ('VAL_2', 'P2_VAL_2') | ('VAL_2', 'P3_VAL_2') | ('VAL_2', 'P4_VAL_2') | |---:|-------------------:|------------------------:|------------------------:|------------------------:|------------------------:|------------------------:|------------------------:|------------------------:|------------------------:| | 0 | 1 | 100 | 200 | 300 | 400 | 1000 | 2000 | 3000 | 4000 | | 1 | 2 | 500 | 600 | 700 | nan | 5000 | 6000 | 7000 | nan |
Original Table
| | SourceID | FieldName | VAL_1 | VAL_2 | | ---- | -------- | --------- | ----- | ----- | | 0 | 1 | P1 | 100 | 1000 | | 1 | 1 | P2 | 200 | 2000 | | 2 | 1 | P3 | 300 | 3000 | | 3 | 1 | P4 | 400 | 4000 | | 4 | 2 | P1 | 500 | 5000 | | 5 | 2 | P2 | 600 | 6000 | | 6 | 2 | P3 | 700 | 7000 |

I think your solution should be simplify, because not necessary new MultiIndex , but join values of columns names after unstack :

gby_temp = df.groupby(['SourceID', 'FieldName']).mean().unstack('FieldName')
gby_temp.columns = [f'{b}_{a}' for a, b in gby_temp.columns]
gby_temp = gby_temp.reset_index()

print(gby_temp)
   SourceID  P1_VAL_1  P2_VAL_1  P3_VAL_1  P4_VAL_1  P1_VAL_2  P2_VAL_2  \
0         1     100.0     200.0     300.0     400.0    1000.0    2000.0   
1         2     500.0     600.0     700.0       NaN    5000.0    6000.0   

   P3_VAL_2  P4_VAL_2  
0    3000.0    4000.0  
1    7000.0       NaN  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM