[英]Pandas pivot based on two columns (multi index)
我有一个DataFrame:
u_id date social_interaction_type_id Total_Count
4 2018-08-19 4 5
4 2018-08-24 2 3
4 2018-08-21 1 4
我想根据u_id和日期透视DataFrame。
因此结果应如下所示:
u_id date 4 2 1
4 2018-08-19 5 nan nan
4 2018-08-24 nan 3 nan
4 2018-08-21 nan nan 4
我的代码尝试:
df.pivot(index = ['u_id','date'] , columns='social_interaction_type_id',values='Total_Count')
错误:
ValueError: Length of passed values is 8803, index implies 1
使用带有可选的解决方案set_index
和unstack
:
df = (df.set_index(['u_id','date','social_interaction_type_id'])['Total_Count']
.unstack()
.reset_index()
.rename_axis(None, axis=1))
print (df)
u_id date 1 2 4
0 4 2018-08-19 NaN NaN 5.0
1 4 2018-08-21 4.0 NaN NaN
2 4 2018-08-24 NaN 3.0 NaN
如果需要在前两列中重复,请使用聚合函数mean
,其sum
为:
print (df)
u_id date social_interaction_type_id Total_Count
0 4 2018-08-19 4 5 <- 4 2018-08-19
1 4 2018-08-19 6 4 <- 4 2018-08-19
2 4 2018-08-24 2 3
3 4 2018-08-21 1 4
df2 = (df.groupby(['u_id','date','social_interaction_type_id'])['Total_Count']
.mean()
.unstack()
.reset_index()
.rename_axis(None, axis=1))
要么:
df2 = (df.pivot_table(index=['u_id','date'],columns='social_interaction_type_id', values='Total_Count')
.reset_index()
.rename_axis(None, axis=1))
print (df2)
u_id date 1 2 4 6
0 4 2018-08-19 NaN NaN 5.0 4.0
1 4 2018-08-21 4.0 NaN NaN NaN
2 4 2018-08-24 NaN 3.0 NaN NaN
出于我未知的原因, pd.DataFrame.pivot
用于index
的值列表。 根据文档,可选index
必须是字符串或对象 。 解决方法是将pd.DataFrame.pivot_table
与aggfunc='first'
:
res = df.pivot_table(index=['u_id', 'date'], columns='social_interaction_type_id',
values='Total_Count', aggfunc='first').reset_index()
print(res)
social_interaction_type_id u_id date 1 2 4
0 4 2018-08-19 NaN NaN 5.0
1 4 2018-08-21 4.0 NaN NaN
2 4 2018-08-24 NaN 3.0 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.