简体   繁体   English

频率计数唯一值熊猫

[英]Frequency count unique values Pandas

I have a Pandas Series as follow : 我有一个熊猫系列如下:

2014-05-24 23:59:49     1.3
2014-05-24 23:59:50    2.17
2014-05-24 23:59:50    1.28
2014-05-24 23:59:51    1.30
2014-05-24 23:59:51    2.17
2014-05-24 23:59:53    2.17
2014-05-24 23:59:58    2.17
Name: api_id, Length: 483677

I'm trying to count for each id the frequency per day. 我正在尝试为每个id每天计数一次。 For now I'm doing this : 现在,我正在这样做:

count = {}
for x in apis.unique():
    count[x] = apis[apis == x].resample('D','count')
count_df = pd.DataFrame(count)

That gives me what I want which is : 这给了我我想要的是:

            ...    2.13   2.17   2.4  2.6  2.7  3.5(user)  3.9  4.2   5.1  5.6  
timestamp   ...                                                                 
2014-05-22  ...     391  49962  3727  161    2        444  113   90  1398   90  
2014-05-23  ...     450  49918  3861  187    1        450  170   90   629   90  
2014-05-24  ...     396  46359  3603  172    3        513  171   89   622   90  

But is there a way to do so without the for loop ? 但是,有没有办法不用for循环呢?

You can use the value_counts function for this ( docs ), applying this after a groupby (which is similar to the resample('D') you did, but resample is expecting an aggregated output so we have to use the more general groupby in this case). 您可以为此( docs )使用value_counts函数,在groupby之后应用此函数(这类似于您所做的resample('D') ,但是resample期望获得汇总输出,因此我们必须在此使用更通用的groupby案件)。 With a small example: 举一个小例子:

In [16]: s = pd.Series([1,1,2,2,1,2,5,6,2,5,4,1], index=pd.date_range('2012-01-01', periods=12, freq='8H'))

In [17]: counts = s.groupby(pd.Grouper(freq='D')).value_counts()

In [18]: counts
Out[18]: 
2012-01-01  1    2
            2    1
2012-01-02  2    2
            1    1
2012-01-03  2    1
            6    1
            5    1
2012-01-04  1    1
            5    1
            4    1
dtype: int64

To get this in the desired format, you can just unstack this (move the second level row indices to the columns): 要以所需的格式获取它,您只需将其拆栈(将第二级行索引移至各列):

In [19]: counts.unstack()
Out[19]: 
             1   2   4   5   6
2012-01-01   2   1 NaN NaN NaN
2012-01-02   1   2 NaN NaN NaN
2012-01-03 NaN   1 NaN   1   1
2012-01-04   1 NaN   1   1 NaN

Note: for the use of groupby(pd.Grouper(freq='D')) you need pandas 0.14. 注意:要使用groupby(pd.Grouper(freq='D'))您需要熊猫0.14。 If you have al older version, you can use groupby(pd.TimeGrouper(freq='D')) to obtain exactly the same. 如果您使用的是旧版本,则可以使用groupby(pd.TimeGrouper(freq='D'))获得完全相同的版本。 This is also similar to doing groupby(s.index.date) (with the difference you have then datetime.date objects in the index). 这也类似于groupby(s.index.date) (区别在于索引中有datetime.date对象)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM