I have a pandas data frame with some columns. I want to rearrange them in a different way. An example is below:
time,name,feature,value
33 20 May 2016 14:00:00 -0700,John,badL,2
45 19 May 2016 18:00:00 -0700,John,badL,1
120 17 May 2016 11:00:00 -0700,John,badL,1
220 20 May 2016 14:00:00 -0700,John,totalL,20
450 19 May 2016 18:00:00 -0700,John,totalL,15
330 18 May 2016 15:00:00 -0700,Mary,badL,2
330 18 May 2016 15:00:00 -0700,Mary,totalL,20
550 21 May 2016 12:00:00 -0700,Mary,adCmd,4
700 22 May 2016 16:00:00 -0700,Mary,PC,3
800 22 May 2016 16:00:00 -0700,Mary,eCon,200
Note: the first column value (time) is preceded by index values (33, 45,120,...). From the above data frame, I want the resulting data frame as:
time,name,badL,totalL,adCmd,PC,eCon
20 May 2016 14:00:00 -0700,John,2,20,0,0,0
19 May 2016 18:00:00 -0700,John,1,15,0,0,0
17 May 2016 11:00:00 -0700,John,1,0,0,0,0
18 May 2016 15:00:00 -0700,Mary,2,20,0,0,0
21 May 2016 12:00:00 -0700,Mary,0,0,4,0,0
22 May 2016 16:00:00 -0700,Mary,0,0,0,3,200
NOTE: for 17th may, John did not have any totalL. So, filled it with 0.
Is there an elegant way to do this? I am setting the time field as a pd.to_datetime, then, comparing...looks to be tedious. For the above example, I have only two 'features' (badL, totalL). I will have several more later.
This is what I have - but, it is adding a different row for the second feature...(totalL)....rather than putting it in the same row.
for f in ['badL', 'totalL']:
dff = df[df.feature == f]
print dff
if len(dff.index) > 0:
fullFeatureDf[f] = dff.feature_value
from StringIO import StringIO
import pandas as pd
text = '''time,name,f1,value
20 May 2016 14:00:00 -0700,John,badL,2
19 May 2016 18:00:00 -0700,John,badL,1
17 May 2016 11:00:00 -0700,John,badL,1
20 May 2016 14:00:00 -0700,John,totalL,20
19 May 2016 18:00:00 -0700,John,totalL,15
17 May 2016 11:00:00 -0700,John,totalL,12
'''
df = pd.read_csv(StringIO(text))
print df
time name f1 value
0 20 May 2016 14:00:00 -0700 John badL 2
1 19 May 2016 18:00:00 -0700 John badL 1
2 17 May 2016 11:00:00 -0700 John badL 1
3 20 May 2016 14:00:00 -0700 John totalL 20
4 19 May 2016 18:00:00 -0700 John totalL 15
5 17 May 2016 11:00:00 -0700 John totalL 12
unstack
df = df.set_index(['time', 'name', 'f1'])
print df
value
time name f1
20 May 2016 14:00:00 -0700 John badL 2
19 May 2016 18:00:00 -0700 John badL 1
17 May 2016 11:00:00 -0700 John badL 1
20 May 2016 14:00:00 -0700 John totalL 20
19 May 2016 18:00:00 -0700 John totalL 15
17 May 2016 11:00:00 -0700 John totalL 12
then unstack to perform pivot. It takes part of the row index and moves it to be columns.
print df.unstack()
value
f1 badL totalL
time name
17 May 2016 11:00:00 -0700 John 1 12
19 May 2016 18:00:00 -0700 John 1 15
20 May 2016 14:00:00 -0700 John 2 20
In spirit, this is an identical solution to Yakym Pirozhenko. Just a slightly different way of doing it. This is more intuitive to me but may not be to you.
This is a job for df.pivot
:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(
'''
time,name,feature,value
33 20 May 2016 14:00:00 -0700,John,badL,2
45 19 May 2016 18:00:00 -0700,John,badL,1
120 17 May 2016 11:00:00 -0700,John,badL,1
220 20 May 2016 14:00:00 -0700,John,totalL,20
450 19 May 2016 18:00:00 -0700,John,totalL,15
330 18 May 2016 15:00:00 -0700,Mary,badL,2
330 18 May 2016 15:00:00 -0700,Mary,totalL,20
550 21 May 2016 12:00:00 -0700,Mary,adCmd,4
700 22 May 2016 16:00:00 -0700,Mary,PC,3
800 22 May 2016 16:00:00 -0700,Mary,eCon,200
'''), sep=',').set_index(['time', 'name'])
df_new = df.pivot(columns='feature').fillna(0).astype(int)
# value
# feature PC adCmd badL eCon totalL
# time name
# 120 17 May 2016 11:00:00 -0700 John 0 0 1 0 0
# 220 20 May 2016 14:00:00 -0700 John 0 0 0 0 20
# 33 20 May 2016 14:00:00 -0700 John 0 0 2 0 0
# 330 18 May 2016 15:00:00 -0700 Mary 0 0 2 0 20
# 45 19 May 2016 18:00:00 -0700 John 0 0 1 0 0
# 450 19 May 2016 18:00:00 -0700 John 0 0 0 0 15
# 550 21 May 2016 12:00:00 -0700 Mary 0 4 0 0 0
# 700 22 May 2016 16:00:00 -0700 Mary 3 0 0 0 0
# 800 22 May 2016 16:00:00 -0700 Mary 0 0 0 200 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.