简体   繁体   English

如何根据特定列表计算频率?

[英]How do I count the frequency against a specific list?

I have a DataFrame that looks like this. 我有一个看起来像这样的DataFrame

                date name
0 2015-06-13 00:21:25    a
1 2015-06-13 01:00:25    b
2 2015-06-13 02:54:48    c
3 2015-06-15 14:38:15    a
4 2015-06-15 15:29:28    b

I want to count the occurrences of dates against a specific date range, including ones that do not appear in the column (and ignores whatever that is in the name column). 我想计算特定日期范围内日期的出现次数,包括那些未出现在列中的日期(并忽略name列中的任何内容)。 For example, I might have a date range that looks like this: 例如,我的日期范围可能如下所示:

periods = pd.date_range('2015-06-13', '2015-06-16', freq = 'd')

Then, I want an output that looks something like: 然后,我想要一个看起来像这样的输出:

date       count    
2015-06-13 3
2015-06-14 0
2015-06-15 2
2015-06-16 0

I haven't been able to find any function that let me keep the 0 rows. 我一直无法找到让我保留0行的任何函数。

I think you can first use date from column date for value_counts and then reindex by periods with fillna by 0 . 我觉得你可以先使用date从列datevalue_counts ,然后reindexperiodsfillna0 Last convert float to int by astype and reset_index : 最后通过astypereset_indexfloat转换为int

df = df['date'].dt.date.value_counts()
print df
2015-06-13    3
2015-06-15    2
Name: date, dtype: int64

periods = pd.date_range('2015-06-13', '2015-06-16', freq = 'd')

df = df.reindex(periods).fillna(0).astype(int).reset_index()
df.columns = ['date','count']
print df
        date  count
0 2015-06-13      3
1 2015-06-14      0
2 2015-06-15      2
3 2015-06-16      0

This is very similar to the solution of @jezrael, but uses a groupby instead of value_counts: 这与@jezrael的解决方案非常相似,但使用groupby而不是value_counts:

>>> (pd.DataFrame(df.groupby(df.date.dt.date)['name']
                    .count()
                    .reindex(periods)
                    .fillna(0))
     .rename(columns={'name': 'count'}))
            count
2015-06-13      3
2015-06-14      0
2015-06-15      2
2015-06-16      0

Note: In Pandas 0.18.0 the reindex operation changes the type of count from ints to floats, so if you are using that version you'll need to tack on .astype(int) to the end. 注意:在Pandas 0.18.0中,reindex操作会将计数类型从int更改为浮点数,因此如果您使用的是该版本,则需要将.astype(int)到最后。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM