简体   繁体   中英

pandas pivot table - rearrange

I have a pandas data frame with some columns. I want to rearrange them in a different way. An example is below:

time,name,feature,value
33 20 May 2016 14:00:00 -0700,John,badL,2
45 19 May 2016 18:00:00 -0700,John,badL,1
120 17 May 2016 11:00:00 -0700,John,badL,1
220 20 May 2016 14:00:00 -0700,John,totalL,20
450 19 May 2016 18:00:00 -0700,John,totalL,15
330 18 May 2016 15:00:00 -0700,Mary,badL,2
330 18 May 2016 15:00:00 -0700,Mary,totalL,20
550 21 May 2016 12:00:00 -0700,Mary,adCmd,4
700 22 May 2016 16:00:00 -0700,Mary,PC,3
800 22 May 2016 16:00:00 -0700,Mary,eCon,200

Note: the first column value (time) is preceded by index values (33, 45,120,...). From the above data frame, I want the resulting data frame as:

time,name,badL,totalL,adCmd,PC,eCon
20 May 2016 14:00:00 -0700,John,2,20,0,0,0
19 May 2016 18:00:00 -0700,John,1,15,0,0,0
17 May 2016 11:00:00 -0700,John,1,0,0,0,0
18 May 2016 15:00:00 -0700,Mary,2,20,0,0,0
21 May 2016 12:00:00 -0700,Mary,0,0,4,0,0
22 May 2016 16:00:00 -0700,Mary,0,0,0,3,200

NOTE: for 17th may, John did not have any totalL. So, filled it with 0.

Is there an elegant way to do this? I am setting the time field as a pd.to_datetime, then, comparing...looks to be tedious. For the above example, I have only two 'features' (badL, totalL). I will have several more later.

This is what I have - but, it is adding a different row for the second feature...(totalL)....rather than putting it in the same row.

for f in ['badL', 'totalL']:
    dff = df[df.feature == f]
    print dff
    if len(dff.index) > 0:
        fullFeatureDf[f] = dff.feature_value

Setup

from StringIO import StringIO
import pandas as pd

text = '''time,name,f1,value
20 May 2016 14:00:00 -0700,John,badL,2
19 May 2016 18:00:00 -0700,John,badL,1
17 May 2016 11:00:00 -0700,John,badL,1
20 May 2016 14:00:00 -0700,John,totalL,20
19 May 2016 18:00:00 -0700,John,totalL,15
17 May 2016 11:00:00 -0700,John,totalL,12
'''

df = pd.read_csv(StringIO(text))

print df

                         time  name      f1  value
0  20 May 2016 14:00:00 -0700  John    badL      2
1  19 May 2016 18:00:00 -0700  John    badL      1
2  17 May 2016 11:00:00 -0700  John    badL      1
3  20 May 2016 14:00:00 -0700  John  totalL     20
4  19 May 2016 18:00:00 -0700  John  totalL     15
5  17 May 2016 11:00:00 -0700  John  totalL     12

Solution using unstack

df = df.set_index(['time', 'name', 'f1'])

print df

                                        value
time                       name f1           
20 May 2016 14:00:00 -0700 John badL        2
19 May 2016 18:00:00 -0700 John badL        1
17 May 2016 11:00:00 -0700 John badL        1
20 May 2016 14:00:00 -0700 John totalL     20
19 May 2016 18:00:00 -0700 John totalL     15
17 May 2016 11:00:00 -0700 John totalL     12

then unstack to perform pivot. It takes part of the row index and moves it to be columns.

print df.unstack()

                                value       
f1                               badL totalL
time                       name             
17 May 2016 11:00:00 -0700 John     1     12
19 May 2016 18:00:00 -0700 John     1     15
20 May 2016 14:00:00 -0700 John     2     20

In spirit, this is an identical solution to Yakym Pirozhenko. Just a slightly different way of doing it. This is more intuitive to me but may not be to you.

This is a job for df.pivot :

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(
'''
time,name,feature,value
33 20 May 2016 14:00:00 -0700,John,badL,2
45 19 May 2016 18:00:00 -0700,John,badL,1
120 17 May 2016 11:00:00 -0700,John,badL,1
220 20 May 2016 14:00:00 -0700,John,totalL,20
450 19 May 2016 18:00:00 -0700,John,totalL,15
330 18 May 2016 15:00:00 -0700,Mary,badL,2
330 18 May 2016 15:00:00 -0700,Mary,totalL,20
550 21 May 2016 12:00:00 -0700,Mary,adCmd,4
700 22 May 2016 16:00:00 -0700,Mary,PC,3
800 22 May 2016 16:00:00 -0700,Mary,eCon,200
'''), sep=',').set_index(['time', 'name'])

df_new = df.pivot(columns='feature').fillna(0).astype(int)

#                                     value
# feature                                PC adCmd badL eCon totalL
# time                           name
# 120 17 May 2016 11:00:00 -0700 John     0     0    1    0      0
# 220 20 May 2016 14:00:00 -0700 John     0     0    0    0     20
# 33 20 May 2016 14:00:00 -0700  John     0     0    2    0      0
# 330 18 May 2016 15:00:00 -0700 Mary     0     0    2    0     20
# 45 19 May 2016 18:00:00 -0700  John     0     0    1    0      0
# 450 19 May 2016 18:00:00 -0700 John     0     0    0    0     15
# 550 21 May 2016 12:00:00 -0700 Mary     0     4    0    0      0
# 700 22 May 2016 16:00:00 -0700 Mary     3     0    0    0      0
# 800 22 May 2016 16:00:00 -0700 Mary     0     0    0  200      0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM