简体   繁体   中英

Transpose/Pivot DataFrame but not all columns in the same row

I have a DataFrame and I want to transpose it.

import pandas as pd
df = pd.DataFrame({'ID':[111,111,222,222,333,333],'Month':['Jan','Feb','Jan','Feb','Jan','Feb'],
                   'Employees':[2,3,1,5,7,1],'Subsidy':[20,30,10,15,40,5]})
print(df)
    ID Month  Employees  Subsidy
0  111   Jan          2       20
1  111   Feb          3       30
2  222   Jan          1       10
3  222   Feb          5       15
4  333   Jan          7       40
5  333   Feb          1        5

Desired output:

    ID        Var   Jan  Feb
0  111  Employees     2    3
1  111    Subsidy    20   30
0  222  Employees     1    5
1  222    Subsidy    10   15
0  333  Employees     7    1
1  333    Subsidy    40    5

My attempt: I tried using pivot_table() , but both Employees & Subsidy naturally appear in same rows, where as I want them on separate rows.

df.pivot_table(index=['ID'],columns='Month',values=['Employees','Subsidy'])

        Employees   Subsidy
Month    Feb  Jan  Feb  Jan
ID
111        3    2   30   20 
222        5    1   15   10 
333        1    7    5   40 

I tried using transpose() , but it transposes entire DataFrame , it seems there is no possibility to transpose by first fixing a column. Any suggestions?

You can add DataFrame.rename_axis for set new column name for first level after pivoting and also None for avoid Month column name in final DataFrame, which is reshaped by DataFrame.stack by first level, last MultiIndex in converted to coumns by DataFrame.reset_index :

df2 = (df.pivot_table(index='ID',
                     columns='Month',
                     values=['Employees','Subsidy'])
        .rename_axis(['Var',None], axis=1)
        .stack(level=0)
        .reset_index()
        )
print (df2)
    ID        Var  Feb  Jan
0  111  Employees    3    2
1  111    Subsidy   30   20
2  222  Employees    5    1
3  222    Subsidy   15   10
4  333  Employees    1    7
5  333    Subsidy    5   40

You were on point with your pivot_table approach. Only thing is you missed stack and reset_index :

df.pivot_table(index=['ID'],columns='Month',values=['Employees','Subsidy']).stack(0).reset_index()

Out[42]: 
Month   ID    level_1  Feb  Jan
0      111  Employees    3    2
1      111    Subsidy   30   20
2      222  Employees    5    1
3      222    Subsidy   15   10
4      333  Employees    1    7
5      333    Subsidy    5   40

You can change the column name to var later if it's needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM