[英]pandas pivot table - rearrange
I have a pandas data frame with some columns. 我有一些带有一些列的pandas数据框。 I want to rearrange them in a different way.
我想以不同的方式重新排列它们。 An example is below:
一个例子如下:
time,name,feature,value
33 20 May 2016 14:00:00 -0700,John,badL,2
45 19 May 2016 18:00:00 -0700,John,badL,1
120 17 May 2016 11:00:00 -0700,John,badL,1
220 20 May 2016 14:00:00 -0700,John,totalL,20
450 19 May 2016 18:00:00 -0700,John,totalL,15
330 18 May 2016 15:00:00 -0700,Mary,badL,2
330 18 May 2016 15:00:00 -0700,Mary,totalL,20
550 21 May 2016 12:00:00 -0700,Mary,adCmd,4
700 22 May 2016 16:00:00 -0700,Mary,PC,3
800 22 May 2016 16:00:00 -0700,Mary,eCon,200
Note: the first column value (time) is preceded by index values (33, 45,120,...). 注意:第一列值(时间)前面是索引值(33,45,120,...)。 From the above data frame, I want the resulting data frame as:
从上面的数据框中,我希望得到的数据框如下:
time,name,badL,totalL,adCmd,PC,eCon
20 May 2016 14:00:00 -0700,John,2,20,0,0,0
19 May 2016 18:00:00 -0700,John,1,15,0,0,0
17 May 2016 11:00:00 -0700,John,1,0,0,0,0
18 May 2016 15:00:00 -0700,Mary,2,20,0,0,0
21 May 2016 12:00:00 -0700,Mary,0,0,4,0,0
22 May 2016 16:00:00 -0700,Mary,0,0,0,3,200
NOTE: for 17th may, John did not have any totalL. 注意:对于17日,约翰没有任何总数。 So, filled it with 0.
所以,用0填充它。
Is there an elegant way to do this? 有一种优雅的方式来做到这一点? I am setting the time field as a pd.to_datetime, then, comparing...looks to be tedious.
我将时间字段设置为pd.to_datetime,然后比较......看起来很乏味。 For the above example, I have only two 'features' (badL, totalL).
对于上面的例子,我只有两个'功能'(badL,totalL)。 I will have several more later.
我稍后会再说几句。
This is what I have - but, it is adding a different row for the second feature...(totalL)....rather than putting it in the same row. 这就是我所拥有的 - 但是,它为第二个特征添加了不同的行...(totalL)....而不是将它放在同一行中。
for f in ['badL', 'totalL']:
dff = df[df.feature == f]
print dff
if len(dff.index) > 0:
fullFeatureDf[f] = dff.feature_value
from StringIO import StringIO
import pandas as pd
text = '''time,name,f1,value
20 May 2016 14:00:00 -0700,John,badL,2
19 May 2016 18:00:00 -0700,John,badL,1
17 May 2016 11:00:00 -0700,John,badL,1
20 May 2016 14:00:00 -0700,John,totalL,20
19 May 2016 18:00:00 -0700,John,totalL,15
17 May 2016 11:00:00 -0700,John,totalL,12
'''
df = pd.read_csv(StringIO(text))
print df
time name f1 value
0 20 May 2016 14:00:00 -0700 John badL 2
1 19 May 2016 18:00:00 -0700 John badL 1
2 17 May 2016 11:00:00 -0700 John badL 1
3 20 May 2016 14:00:00 -0700 John totalL 20
4 19 May 2016 18:00:00 -0700 John totalL 15
5 17 May 2016 11:00:00 -0700 John totalL 12
unstack
unstack
df = df.set_index(['time', 'name', 'f1'])
print df
value
time name f1
20 May 2016 14:00:00 -0700 John badL 2
19 May 2016 18:00:00 -0700 John badL 1
17 May 2016 11:00:00 -0700 John badL 1
20 May 2016 14:00:00 -0700 John totalL 20
19 May 2016 18:00:00 -0700 John totalL 15
17 May 2016 11:00:00 -0700 John totalL 12
then unstack to perform pivot. 然后取消堆栈以执行枢轴。 It takes part of the row index and moves it to be columns.
它占用行索引的一部分并将其移动为列。
print df.unstack()
value
f1 badL totalL
time name
17 May 2016 11:00:00 -0700 John 1 12
19 May 2016 18:00:00 -0700 John 1 15
20 May 2016 14:00:00 -0700 John 2 20
In spirit, this is an identical solution to Yakym Pirozhenko. 在精神上,这是与Yakym Pirozhenko完全相同的解决方案。 Just a slightly different way of doing it.
这样做的方式略有不同。 This is more intuitive to me but may not be to you.
这对我来说更直观,但可能不适合你。
This is a job for df.pivot
: 这是
df.pivot
的工作:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(
'''
time,name,feature,value
33 20 May 2016 14:00:00 -0700,John,badL,2
45 19 May 2016 18:00:00 -0700,John,badL,1
120 17 May 2016 11:00:00 -0700,John,badL,1
220 20 May 2016 14:00:00 -0700,John,totalL,20
450 19 May 2016 18:00:00 -0700,John,totalL,15
330 18 May 2016 15:00:00 -0700,Mary,badL,2
330 18 May 2016 15:00:00 -0700,Mary,totalL,20
550 21 May 2016 12:00:00 -0700,Mary,adCmd,4
700 22 May 2016 16:00:00 -0700,Mary,PC,3
800 22 May 2016 16:00:00 -0700,Mary,eCon,200
'''), sep=',').set_index(['time', 'name'])
df_new = df.pivot(columns='feature').fillna(0).astype(int)
# value
# feature PC adCmd badL eCon totalL
# time name
# 120 17 May 2016 11:00:00 -0700 John 0 0 1 0 0
# 220 20 May 2016 14:00:00 -0700 John 0 0 0 0 20
# 33 20 May 2016 14:00:00 -0700 John 0 0 2 0 0
# 330 18 May 2016 15:00:00 -0700 Mary 0 0 2 0 20
# 45 19 May 2016 18:00:00 -0700 John 0 0 1 0 0
# 450 19 May 2016 18:00:00 -0700 John 0 0 0 0 15
# 550 21 May 2016 12:00:00 -0700 Mary 0 4 0 0 0
# 700 22 May 2016 16:00:00 -0700 Mary 3 0 0 0 0
# 800 22 May 2016 16:00:00 -0700 Mary 0 0 0 200 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.