如何根据特定列表计算频率？

Question

I have a DataFrame that looks like this. 我有一个看起来像这样的DataFrame 。

                date name
0 2015-06-13 00:21:25    a
1 2015-06-13 01:00:25    b
2 2015-06-13 02:54:48    c
3 2015-06-15 14:38:15    a
4 2015-06-15 15:29:28    b

I want to count the occurrences of dates against a specific date range, including ones that do not appear in the column (and ignores whatever that is in the name column). 我想计算特定日期范围内日期的出现次数，包括那些未出现在列中的日期（并忽略name列中的任何内容）。 For example, I might have a date range that looks like this: 例如，我的日期范围可能如下所示：

periods = pd.date_range('2015-06-13', '2015-06-16', freq = 'd')

Then, I want an output that looks something like: 然后，我想要一个看起来像这样的输出：

date       count    
2015-06-13 3
2015-06-14 0
2015-06-15 2
2015-06-16 0

I haven't been able to find any function that let me keep the 0 rows. 我一直无法找到让我保留0行的任何函数。

Answer 1

I think you can first use date from column date for value_counts and then reindex by periods with fillna by 0 . 我觉得你可以先使用date从列date的value_counts ，然后reindex的periods与fillna由0 。 Last convert float to int by astype and reset_index : 最后通过astype和reset_index将float转换为int ：

df = df['date'].dt.date.value_counts()
print df
2015-06-13    3
2015-06-15    2
Name: date, dtype: int64

periods = pd.date_range('2015-06-13', '2015-06-16', freq = 'd')

df = df.reindex(periods).fillna(0).astype(int).reset_index()
df.columns = ['date','count']
print df
        date  count
0 2015-06-13      3
1 2015-06-14      0
2 2015-06-15      2
3 2015-06-16      0

Answer 2

This is very similar to the solution of @jezrael, but uses a groupby instead of value_counts: 这与@jezrael的解决方案非常相似，但使用groupby而不是value_counts：

>>> (pd.DataFrame(df.groupby(df.date.dt.date)['name']
                    .count()
                    .reindex(periods)
                    .fillna(0))
     .rename(columns={'name': 'count'}))
            count
2015-06-13      3
2015-06-14      0
2015-06-15      2
2015-06-16      0

Note: In Pandas 0.18.0 the reindex operation changes the type of count from ints to floats, so if you are using that version you'll need to tack on .astype(int) to the end. 注意：在Pandas 0.18.0中，reindex操作会将计数类型从int更改为浮点数，因此如果您使用的是该版本，则需要将.astype(int)到最后。

如何根据特定列表计算频率？

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-04-04 05:43:25

解决方案2
1 2016-04-04 06:00:47

如何根据特定列表计算频率？

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-04-04 05:43:25

解决方案2 1 2016-04-04 06:00:47

解决方案1
2 已采纳 2016-04-04 05:43:25

解决方案2
1 2016-04-04 06:00:47