简体   繁体   中英

New Columns based on column value in pandas from long to wide format

I have a dataframe that has unique identifier in one column and is in long format. My goal is to have one user_id(student) per row and to pivot so that the structure is wide.

Current dataframe example:

   user_id    test_type       test_date  
0  1          ACT             2013-08-15                           
1  2          ACT             2011-12-09                          
2  3          SAT             2012-03-09                      
3  4          ACT             2003-07-27                         
4  4          SAT             2013-12-31 

The problem is that some students have taken both tests so I want to ultimately have one column for ACT, one column for SAT, and a column each for the corresponding date.

Desired Format:

   user_id    test_ACT        ACT_date       test_SAT     SAT_date 
0  1          ACT             2013-08-15       NaN           NaN      
1  2          ACT             2011-12-09       NaN           NaN  
2  3          NaN                 NaN          SAT        2012-03-09
3  4          ACT             2003-07-27       SAT        2013-12-31

I have tried to groupby and pivot:

df['idx'] = df.groupby('user_id').cumcount()

tmp = []
for var in ['test_type','test_date']:
    procedure_sct['tmp_idx'] = var + '_' + df.idx.astype(str)
    tmp.append(df.pivot(index='user_id',columns='tmp_idx',values=var))

df_wide = pd.concat(tmp,axis=1).reset_index()

This means that the format is wide but not separated by test type.

Output from attempt but not desired:

   user_id   test_type_0      test_date_0       test_type_1   test_date_1 
0  1          ACT             2013-08-15             NaN           NaN      
1  2          ACT             2011-12-09             NaN           NaN  
2  3          SAT             2012-03-09             NaN           NaN
3  4          ACT             2003-07-27             SAT          2013-12-31

After trying provided answer:

index  user_id   ACT_date   test_ACT user_id    SAT_date   test_SAT
0  0      1.0      2013-08-15  ACT    NaN         NaN         NaN         
1  1      2.0      2011-12-09  ACT    NaN         NaN         NaN         
2  2     NaN       NaN         NaN    3.0      2012-03-09     SAT
3  3      4.0      2003-07-27  ACT    NaN         NaN         NaN         
4  4     NaN       NaN         NaN    4.0      2013-12-31     SAT

This should work:

df1=df[df.test_type=='ACT'].set_index(user_id)
df1.columns = ['ACT_date']
df1["test_ACT"]="ACT"

df2=df[dft.test_type=='SAT'].set_index(user_id)
df1.columns = ['SAT_date']
df2["test_SAT"]="SAT"

finaldf = pd.concat([df1, df2], axis=1).reset_index()
#create temporary column
#and set index
res = (df.assign(temp = df.test_type)
       .set_index(['user_id','temp'])
        )

#unstack
#remove unnecessary column level
#and rename columns
(res.unstack()
.droplevel(0,axis=1)
.set_axis(['test_ACT','test_SAT','ACT_date','SAT_date'],axis=1)
 )


        test_ACT    test_SAT    ACT_date    SAT_date
user_id             
1        ACT        NaN         2013-08-15  NaN
2        ACT        NaN         2011-12-09  NaN
3        NaN        SAT         NaN         2012-03-09
4        ACT        SAT         2003-07-27  2013-12-31

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM