简体   繁体   中英

Long to wide data. Pandas

I'm trying to take my dataframe from a long format in which I have a column with a categorical variable, into a wide format in which each category has it's own price column. Currently, my data looks like this:

date-time            date       vendor    payment_type   price
03-10-15 10:00:00    03-10-15     A1            1          50
03-10-15 10:00:00    03-10-15     A1            2          60
03-10-15 10:00:00    03-11-15     A1            1          45
03-10-15 10:00:00    03-11-15     A1            2          70
03-10-15 10:00:00    03-12-15     B1            1          40
03-10-15 10:00:00    03-12-15     B1            2          45
03-10-15 10:00:00    03-10-15     C1            1          60
03-10-15 10:00:00    03-10-15     C1            1          65

My goal is to have a column for every vendor's price and for each payment type and one row per day. When there are multiple values per day, I want to use the maximum value. The end result should look something like this.

Date       A1_Pay1   A2_Pay2 ... C1_Pay1   C1_Pay2
03-10-15     50        60    ...   65        NaN
03-11-15     45        70    ...   NaN       NaN
03-12-15     NaN       NaN   ...   NaN       NaN

I tried using unstack and pivot, but I either wasn't getting what I was going for, or was getting an error about Date not being a unique index.

Any ideas?

You can use pivot_table :

#convert column payment_type to string
df['payment_type'] = df['payment_type'].astype(str)

df = pd.pivot_table(df, index='date', columns=['vendor', 'payment_type'], aggfunc=max)

#remove top level of multiindex
df.columns = df.columns.droplevel(0)

#reset multicolumns
df.columns = ['_Pay'.join(col).strip() for col in df.columns.values]

print df
            A1_Pay1  A1_Pay2  B1_Pay1  B1_Pay2  C1_Pay1
date                                                   
2015-03-10       50       60      NaN      NaN       65
2015-03-11       45       70      NaN      NaN      NaN
2015-03-12      NaN      NaN       40       45      NaN

EDIT:

If you need other statistics, you can add them as list to aggfunc :

#convert column payment_type to string
df['payment_type'] = df['payment_type'].astype(str)
df = pd.pivot_table(df, index='date', columns=['vendor', 'payment_type'], 
                                      aggfunc=[np.mean, np.max, np.median])
print df
              mean                    amax                 median              \
             price                   price                  price               
vendor          A1      B1        C1    A1      B1      C1     A1      B1       
payment_type     1   2   1   2     1     1   2   1   2   1      1   2   1   2   
date                                                                            
2015-03-10      50  60 NaN NaN  62.5    50  60 NaN NaN  65     50  60 NaN NaN   
2015-03-11      45  70 NaN NaN   NaN    45  70 NaN NaN NaN     45  70 NaN NaN   
2015-03-12     NaN NaN  40  45   NaN   NaN NaN  40  45 NaN    NaN NaN  40  45   



vendor          C1  
payment_type     1  
date                
2015-03-10    62.5  
2015-03-11     NaN  
2015-03-12     NaN 
#remove top level of multiindex
df.columns = df.columns.droplevel(1)
#reset multicolumns
df.columns = ['_Pay'.join(col).strip() for col in df.columns.values]
print df
            mean_PayA1_Pay1  mean_PayA1_Pay2  mean_PayB1_Pay1  \
date                                                            
2015-03-10               50               60              NaN   
2015-03-11               45               70              NaN   
2015-03-12              NaN              NaN               40   

            mean_PayB1_Pay2  mean_PayC1_Pay1  amax_PayA1_Pay1  \
date                                                            
2015-03-10              NaN             62.5               50   
2015-03-11              NaN              NaN               45   
2015-03-12               45              NaN              NaN   

            amax_PayA1_Pay2  amax_PayB1_Pay1  amax_PayB1_Pay2  \
date                                                            
2015-03-10               60              NaN              NaN   
2015-03-11               70              NaN              NaN   
2015-03-12              NaN               40               45   

            amax_PayC1_Pay1  median_PayA1_Pay1  median_PayA1_Pay2  \
date                                                                
2015-03-10               65                 50                 60   
2015-03-11              NaN                 45                 70   
2015-03-12              NaN                NaN                NaN   

            median_PayB1_Pay1  median_PayB1_Pay2  median_PayC1_Pay1  
date                                                                 
2015-03-10                NaN                NaN               62.5  
2015-03-11                NaN                NaN                NaN  
2015-03-12                 40                 45                NaN 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM