Pandas dataframe，按日期/月份分组并按类别计数

Question

I have a dataframe with this sort of structure:我有一个具有这种结构的 dataframe：

df = pd.DataFrame({ "name": ["Victor Hugo", "Emile Zola", "Paul Verlaine", "Charles Baudelaire"], "date_enrolled": ["2020-05-20 08:48:21+00:00", "2020-05-05 17:30:11+00:00", "2020-05-22 01:11:24+00:00", "2020-07-29 09:32:10+00:00"], "cursus": ["AAA", "AAA", "BBB", "AAA"] })

I try to obtain something like that:我试图获得类似的东西：

period时期	AAA AAA	BBB BBB
2020-05 2020-05	2 2	1 1
2020-06 2020-06	0 0	0 0
2020-07 2020-07	1 1	0 0

In short: each cursus in one column with the count of enrolled name, with a period of time (YYYY-MM or potentially other date grouping/format), and for all period of time, including those that are empty (like 2020-06 in my example)简而言之：每一列中的每个 cursus 都有注册名称的计数，一段时间（YYYY-MM 或可能的其他日期分组/格式），以及所有时间段，包括那些为空的时间段（如 2020-06在我的例子中）

I have done many tests, but none gives me satisfaction...我做了很多测试，但没有一个让我满意...

Thank you for any assistance.感谢您提供任何帮助。

Answer 1

Convert date_enrolled into YYYY-MM by using Series.dt.to_period and df.pivot_table and then add missing months by using df.reindex :使用Series.dt.to_period和df.pivot_table将date_enrolled转换为YYYY-MM ，然后使用df.reindex添加缺失的月份：

In [937]: df.date_enrolled = pd.to_datetime(df.date_enrolled).dt.to_period('M')

In [947]: ans = df.pivot_table(index='date_enrolled', columns='cursus', aggfunc='count', fill_value=0)

In [979]: ans = ans.reindex(pd.period_range(ans.index[0], ans.index[-1],freq='M'), fill_value=0)

In [980]: ans
Out[980]: 
        name    
cursus   AAA BBB
2020-05    2   1
2020-06    0   0
2020-07    1   0

Answer 2

Use crosstab with convert date_enrolled to months periods by Series.dt.to_period and then add missing months by DataFrame.reindex :使用crosstab ，通过 Series.dt.to_period 将date_enrolled转换为月份，然后通过Series.dt.to_period添加缺失的DataFrame.reindex ：

df['date_enrolled'] = pd.to_datetime(df['date_enrolled'])

df = pd.crosstab(df['date_enrolled'].dt.to_period('m'), df['cursus'])
        
df = df.reindex(pd.period_range(df.index.min(),df.index.max(), name='period'), fill_value=0)
print (df)
cursus   AAA  BBB
period           
2020-05    2    1
2020-06    0    0
2020-07    1    0

Or with DataFrame.asfreq :或使用DataFrame.asfreq ：

df['date_enrolled'] = pd.to_datetime(df['date_enrolled'])

df = (pd.crosstab(df['date_enrolled'].dt.to_period('m').dt.to_timestamp(), df['cursus'])
        .asfreq('MS', fill_value=0)
        .to_period('m'))
print (df)

cursus         AAA  BBB
date_enrolled          
2020-05          2    1
2020-06          0    0
2020-07          1    0

Last if necessary column from date_enrolled use:必要时使用date_enrolled的最后一列：

df = df.reset_index().rename_axis(None, axis=1)
print (df)

    period  AAA  BBB
0  2020-05    2    1
1  2020-06    0    0
2  2020-07    1    0

Pandas dataframe，按日期/月份分组并按类别计数

问题描述

2 个解决方案

解决方案1
1 2021-02-10 13:17:20

解决方案2
1 已采纳 2021-02-10 13:18:57

Pandas dataframe，按日期/月份分组并按类别计数

问题描述

2 个解决方案

解决方案1 1 2021-02-10 13:17:20

解决方案2 1 已采纳 2021-02-10 13:18:57

解决方案1
1 2021-02-10 13:17:20

解决方案2
1 已采纳 2021-02-10 13:18:57