简体   繁体   中英

Frequency count unique values Pandas

I have a Pandas Series as follow :

2014-05-24 23:59:49     1.3
2014-05-24 23:59:50    2.17
2014-05-24 23:59:50    1.28
2014-05-24 23:59:51    1.30
2014-05-24 23:59:51    2.17
2014-05-24 23:59:53    2.17
2014-05-24 23:59:58    2.17
Name: api_id, Length: 483677

I'm trying to count for each id the frequency per day. For now I'm doing this :

count = {}
for x in apis.unique():
    count[x] = apis[apis == x].resample('D','count')
count_df = pd.DataFrame(count)

That gives me what I want which is :

            ...    2.13   2.17   2.4  2.6  2.7  3.5(user)  3.9  4.2   5.1  5.6  
timestamp   ...                                                                 
2014-05-22  ...     391  49962  3727  161    2        444  113   90  1398   90  
2014-05-23  ...     450  49918  3861  187    1        450  170   90   629   90  
2014-05-24  ...     396  46359  3603  172    3        513  171   89   622   90  

But is there a way to do so without the for loop ?

You can use the value_counts function for this ( docs ), applying this after a groupby (which is similar to the resample('D') you did, but resample is expecting an aggregated output so we have to use the more general groupby in this case). With a small example:

In [16]: s = pd.Series([1,1,2,2,1,2,5,6,2,5,4,1], index=pd.date_range('2012-01-01', periods=12, freq='8H'))

In [17]: counts = s.groupby(pd.Grouper(freq='D')).value_counts()

In [18]: counts
Out[18]: 
2012-01-01  1    2
            2    1
2012-01-02  2    2
            1    1
2012-01-03  2    1
            6    1
            5    1
2012-01-04  1    1
            5    1
            4    1
dtype: int64

To get this in the desired format, you can just unstack this (move the second level row indices to the columns):

In [19]: counts.unstack()
Out[19]: 
             1   2   4   5   6
2012-01-01   2   1 NaN NaN NaN
2012-01-02   1   2 NaN NaN NaN
2012-01-03 NaN   1 NaN   1   1
2012-01-04   1 NaN   1   1 NaN

Note: for the use of groupby(pd.Grouper(freq='D')) you need pandas 0.14. If you have al older version, you can use groupby(pd.TimeGrouper(freq='D')) to obtain exactly the same. This is also similar to doing groupby(s.index.date) (with the difference you have then datetime.date objects in the index).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM