[英]Pandas value_counts Into New Columns
I have a timeseries dataset which looks a bit like 我有一个时间序列数据集,看起来有点像
ts userid v1 v2
2016-04-23 10:50:12 100001 10 ac
2016-04-23 11:23:29 100002 11 ad
2016-04-23 11:56:57 100002 11 ad
2016-04-23 12:33:38 100001 12 ae
2016-04-23 13:06:43 100001 13 aa
2016-04-23 14:16:34 100001 14 ag
2016-04-23 15:26:39 100002 15 ab
2016-04-23 23:29:31 100003 23 aw
I'd like to extract the count of v1
for each user - into a new DataFrame similar to 我想将每个用户的
v1
计数提取到类似的新DataFrame中
userid v1_0 ... v1_10 v1_11 v1_12 v1_13 v1_14 v1_15 ... v1_23
100001 0 ... 1 0 1 1 1 0 ... 0
100002 0 ... 0 2 0 0 0 1 ... 0
100003 0 ... 0 0 0 0 0 0 ... 1
v1
is hour of the day (max. 24 values), thus implying 24 new columns to be added v1
是一天中的小时 (最多24个值),因此意味着要添加24个新列 v2
indicates the type of event v2
表示事件的类型 v1_11
is 2 for userid 100002 because there were 2 events between 11AM and noon v1_11
为2,因为在11AM和中午之间有2个事件 Could someone please suggest how this can be achieved using pandas? 有人可以建议如何使用熊猫来实现这一目标吗?
Thanks in advance. 提前致谢。
Here's a snippet to recreate the original DataFrame, 这是一个重新创建原始DataFrame的片段,
import pandas as pd
l1 = ['2016-04-23 10:50:12', '2016-04-23 11:23:29', '2016-04-23 11:56:57',
'2016-04-23 12:33:38', '2016-04-23 13:06:43', '2016-04-23 14:16:34',
'2016-04-23 15:26:39', '2016-04-23 23:29:31']
l2 = [100001, 100002, 100002, 100001, 100001, 100001, 100002, 100003]
l3 = [10, 11, 11, 12, 13, 14, 15, 23]
l4 = ['ac','ad','ad','ae', 'aa','ag', 'ab', 'aw']
df = pd.DataFrame({'ts':l1, 'userid':l2, 'v1':l3, 'v2':l4})
You can do it with crosstab: 你可以用交叉表做到这一点:
pd.crosstab(df['userid'], df['v1'])
Out[30]:
v1 10 11 12 13 14 15 23
userid
100001 1 0 1 1 1 0 0
100002 0 2 0 0 0 1 0
100003 0 0 0 0 0 0 1
For other alternatives, take a look at this answer . 对于其他替代方案,请看一下这个答案 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.