简体   繁体   中英

Pandas: reshaping dataframe by splitting columns in a column and a variable

I have the following dataframe that I am trying to melt:

import numpy as np
import pandas as pd
dates = pd.date_range('1/1/2014', periods=4)
df = pd.DataFrame(np.eye(4, ), index=dates, columns=['A_var1', 'A_var2', 'B_var1', 'B_var2'])
print(df)

             A_var1  A_var2  B_var1  B_var2
2014-01-01     1.0     0.0     0.0     0.0
2014-01-02     0.0     1.0     0.0     0.0
2014-01-03     0.0     0.0     1.0     0.0
2014-01-04     0.0     0.0     0.0     1.0

I want to obtain the following:

            type    var1    var2  
2014-01-01   A      1.0     0.0    
2014-01-01   B      0.0     0.0    
2014-01-02   A      0.0     1.0     
2014-01-02   B      0.0     0.0  
2014-01-03   A      0.0     0.0    
2014-01-03   B      1.0     0.0
2014-01-04   A      0.0     0.0     
2014-01-04   B      0.0     1.0

Any idea on how to do that efficiently? I know I can use the melt function but I can't get it to work in that context.

Many thanks,

You could use stack on multi-indexed columns.

In [304]: df.columns = df.columns.str.split('_', expand=True)

In [305]: df.stack(0).reset_index(1)
Out[305]:
           level_1  var1  var2
2014-01-01       A   1.0   0.0
2014-01-01       B   0.0   0.0
2014-01-02       A   0.0   1.0
2014-01-02       B   0.0   0.0
2014-01-03       A   0.0   0.0
2014-01-03       B   1.0   0.0
2014-01-04       A   0.0   0.0
2014-01-04       B   0.0   1.0

One option is with the pivot_longer function from pyjanitor , using the .value placeholder:

# pip install pyjanitor
import pandas as pd
import janitor

df.pivot_longer(names_to=("type", ".value"), 
                names_sep="_", 
                ignore_index=False, 
                sort_by_appearance = True)

          type  var1  var2
2014-01-01   A   1.0   0.0
2014-01-01   B   0.0   0.0
2014-01-02   A   0.0   1.0
2014-01-02   B   0.0   0.0
2014-01-03   A   0.0   0.0
2014-01-03   B   1.0   0.0
2014-01-04   A   0.0   0.0
2014-01-04   B   0.0   1.0

The .value keeps the part of the column associated with it as header, while the rest goes into the type column.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM