Pandas value_counts进入新列

Question

I have a timeseries dataset which looks a bit like 我有一个时间序列数据集，看起来有点像

ts                  userid  v1   v2
2016-04-23 10:50:12 100001  10   ac
2016-04-23 11:23:29 100002  11   ad
2016-04-23 11:56:57 100002  11   ad
2016-04-23 12:33:38 100001  12   ae
2016-04-23 13:06:43 100001  13   aa
2016-04-23 14:16:34 100001  14   ag
2016-04-23 15:26:39 100002  15   ab
2016-04-23 23:29:31 100003  23   aw

I'd like to extract the count of v1 for each user - into a new DataFrame similar to 我想将每个用户的v1计数提取到类似的新DataFrame中

userid   v1_0 ... v1_10 v1_11 v1_12 v1_13 v1_14 v1_15 ... v1_23
100001     0  ...   1     0     1     1     1     0   ...   0
100002     0  ...   0     2     0     0     0     1   ...   0
100003     0  ...   0     0     0     0     0     0   ...   1

v1 is hour of the day (max. 24 values), thus implying 24 new columns to be added v1是一天中的小时 （最多24个值），因此意味着要添加24个新列
v2 indicates the type of event v2表示事件的类型
v1_11 is 2 for userid 100002 because there were 2 events between 11AM and noon 对于用户ID 100002， v1_11为2，因为在11AM和中午之间有2个事件

Could someone please suggest how this can be achieved using pandas? 有人可以建议如何使用熊猫来实现这一目标吗？

Thanks in advance. 提前致谢。

Here's a snippet to recreate the original DataFrame, 这是一个重新创建原始DataFrame的片段，

import pandas as pd

l1 = ['2016-04-23 10:50:12', '2016-04-23 11:23:29', '2016-04-23 11:56:57',
      '2016-04-23 12:33:38', '2016-04-23 13:06:43', '2016-04-23 14:16:34',
      '2016-04-23 15:26:39', '2016-04-23 23:29:31']
l2 = [100001, 100002, 100002, 100001, 100001, 100001, 100002, 100003]
l3 = [10, 11, 11, 12, 13, 14, 15, 23]
l4 = ['ac','ad','ad','ae', 'aa','ag', 'ab', 'aw']
df = pd.DataFrame({'ts':l1, 'userid':l2, 'v1':l3, 'v2':l4})

Answer 1

You can do it with crosstab: 你可以用交叉表做到这一点：

pd.crosstab(df['userid'], df['v1'])
Out[30]: 
v1      10  11  12  13  14  15  23
userid                            
100001   1   0   1   1   1   0   0
100002   0   2   0   0   0   1   0
100003   0   0   0   0   0   0   1

For other alternatives, take a look at this answer . 对于其他替代方案，请看一下这个答案。

Answer 2

This will do: 这样做：

df.groupby('userid').v1.value_counts().unstack(0).reindex(range(24)).fillna(0).astype(int).T

Pandas value_counts进入新列

问题描述

2 个解决方案

解决方案1
5 已采纳 2016-07-13 16:16:24

解决方案2
2 2016-07-13 16:21:30

Pandas value_counts进入新列

问题描述

2 个解决方案

解决方案1 5 已采纳 2016-07-13 16:16:24

解决方案2 2 2016-07-13 16:21:30

解决方案1
5 已采纳 2016-07-13 16:16:24

解决方案2
2 2016-07-13 16:21:30