简体   繁体   中英

Reshaping pandas multi-index dataframe to multi-column

i have been trying to reshape a given pandas dataframe for two days. I would like to transform my multi-index dataframe into multi-column form, but failed greatly by using pd.stack(), pd.unstack(), pd.melt(), ...

I have a generic multi-index dataframe, given as:

import pandas

df = pandas.DataFrame({'Scenario' : ['Scen1', 'Scen1', 'Scen1', 'Scen1', 
                                     'Scen1', 'Scen1', 'Scen2','Scen2',
                                     'Scen2', 'Scen2','Scen2','Scen2'],
                                     'Tech' : ['x', 'y', 'z', 'x', 'y', 'z', 
                                               'x', 'y', 'z', 'x', 'y', 'z'],
                                     'Year' : ['2010', '2010', '2010', 
                                               '2015', '2015', '2015',
                                               '2010', '2010', '2010', 
                                               '2015', '2015', '2015'],
                                     'Sum' : ['1', '2', '3', '4', 
                                              '5', '6', '7', '8', 
                                              '9', '10', '11', '12']})

df.set_index(['Scenario', 'Tech'], inplace=True)
print(df)

                   Sum  Year                                                                                                 
Scenario Tech                                                                                                           
Scen1    x      1  2010                                                                                                 
         y      2  2010                                                                                                 
         z      3  2010                                                                                                 
         x      4  2015                                                                                                 
         y      5  2015                                                                                                 
         z      6  2015                                                                                                 
Scen2    x      7  2010                                                                                                 
         y      8  2010                                                                                                 
         z      9  2010                                                                                                 
         x     10  2015                                                                                                 
         y     11  2015                                                                                                 
         z     12  2015   

However, I would like to convert it into the following form:

              2010         2015         
    Tech   Scen1 Scen2  Scen1 Scen2  
       x    ...   ...    ...   ... 
       y    ...   ...    ...   ...
       z    ...   ...    ...   ...

Iam using python 3.7 for Windows 64bit with the pandas version 0.23.4 and would be very grateful for any kind of tips and possible solutions that will help me to get ahead with my problems.

There are many good answers on melt and pivot on SO. In your sample df, sum column is of string type. Convert it to int and use pivot_table. Key difference between pivot and pivot_table is that when your index contains duplicated entries, you need to use pivot_table with some aggregate function. If you don't pass any function, the default is mean.

df['Sum'] = df['Sum'].astype(int)
df.pivot_table(index = 'Tech', columns = ['Year', 'Scenario'], values = 'Sum')



Year        2010            2015
Scenario    Scen1   Scen2   Scen1   Scen2
Tech                
x           1       7       4       10
y           2       8       5       11
z           3       9       6       12

Note: The same can be done using groupby. Since you need columns at two levels, you need to unstack twice.

df.groupby(['Tech','Scenario','Year'])['Sum'].mean().unstack().unstack()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM