[英]Frequency count unique values Pandas
I have a Pandas Series as follow : 我有一个熊猫系列如下:
2014-05-24 23:59:49 1.3
2014-05-24 23:59:50 2.17
2014-05-24 23:59:50 1.28
2014-05-24 23:59:51 1.30
2014-05-24 23:59:51 2.17
2014-05-24 23:59:53 2.17
2014-05-24 23:59:58 2.17
Name: api_id, Length: 483677
I'm trying to count for each id the frequency per day. 我正在尝试为每个id每天计数一次。 For now I'm doing this : 现在,我正在这样做:
count = {}
for x in apis.unique():
count[x] = apis[apis == x].resample('D','count')
count_df = pd.DataFrame(count)
That gives me what I want which is : 这给了我我想要的是:
... 2.13 2.17 2.4 2.6 2.7 3.5(user) 3.9 4.2 5.1 5.6
timestamp ...
2014-05-22 ... 391 49962 3727 161 2 444 113 90 1398 90
2014-05-23 ... 450 49918 3861 187 1 450 170 90 629 90
2014-05-24 ... 396 46359 3603 172 3 513 171 89 622 90
But is there a way to do so without the for loop ? 但是,有没有办法不用for循环呢?
You can use the value_counts
function for this ( docs ), applying this after a groupby (which is similar to the resample('D')
you did, but resample is expecting an aggregated output so we have to use the more general groupby in this case). 您可以为此( docs )使用value_counts
函数,在groupby之后应用此函数(这类似于您所做的resample('D')
,但是resample期望获得汇总输出,因此我们必须在此使用更通用的groupby案件)。 With a small example: 举一个小例子:
In [16]: s = pd.Series([1,1,2,2,1,2,5,6,2,5,4,1], index=pd.date_range('2012-01-01', periods=12, freq='8H'))
In [17]: counts = s.groupby(pd.Grouper(freq='D')).value_counts()
In [18]: counts
Out[18]:
2012-01-01 1 2
2 1
2012-01-02 2 2
1 1
2012-01-03 2 1
6 1
5 1
2012-01-04 1 1
5 1
4 1
dtype: int64
To get this in the desired format, you can just unstack this (move the second level row indices to the columns): 要以所需的格式获取它,您只需将其拆栈(将第二级行索引移至各列):
In [19]: counts.unstack()
Out[19]:
1 2 4 5 6
2012-01-01 2 1 NaN NaN NaN
2012-01-02 1 2 NaN NaN NaN
2012-01-03 NaN 1 NaN 1 1
2012-01-04 1 NaN 1 1 NaN
Note: for the use of groupby(pd.Grouper(freq='D'))
you need pandas 0.14. 注意:要使用groupby(pd.Grouper(freq='D'))
您需要熊猫0.14。 If you have al older version, you can use groupby(pd.TimeGrouper(freq='D'))
to obtain exactly the same. 如果您使用的是旧版本,则可以使用groupby(pd.TimeGrouper(freq='D'))
获得完全相同的版本。 This is also similar to doing groupby(s.index.date)
(with the difference you have then datetime.date
objects in the index). 这也类似于groupby(s.index.date)
(区别在于索引中有datetime.date
对象)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.