在熊猫数据框中每月计算和累计

Question

我有一个包含两列的 Pandas DataFrame： id和processing_date 。

后者是处理项目 (id) 的日期。

import pandas as pd
# df
  id  processed_date
 324      2016-07-08
A550      2016-07-09
  79      2016-08-10
C295      2016-08-10
 413      2016-08-11
...
 111      2021-11-08 
 709      2021-11-08

我想绘制一个显示每个月处理的项目数量的图表和一个“几个月内”的累积图表。 由于我有 5 年零 4 个月的数据，因此我必须有 64 个条目和 64 个数据点才能绘制为条形图或折线图。

这是我从这里得到的，但它没有按预期工作：

df['date'] = pd.to_datetime(df['processed_date']) # needed by the nature of the data
df.set_index('date')

df = df.groupby('date')['id'].count() # <- this will stack items per day
df = df.groupby(df.index.month)['id'].count() # <- this will stack items per 12 months, but I have 5 years and 4 months of data, hence 64 different months, not 12.

我怎么能做到这一点？

理想输出：

# df
  nb_items_processed  cum_sum year_month
                   2        2    2016-07
                   3        5    2016-08
...
                   2      xxx    2021-11

Answer 1

从groupby().size()每月计数，然后 cumsum 没有 groupby：

out = df.groupby(pd.Grouper(key='processed_date', freq='M')).size().reset_index(name='nb_items_processed')

out['cum_sum'] = out['nb_items_processed'].cumsum()

在熊猫数据框中每月计算和累计

问题描述

1 个解决方案

解决方案1
3 已采纳 2021-11-08 18:56:04

在熊猫数据框中每月计算和累计

问题描述

1 个解决方案

解决方案1 3 已采纳 2021-11-08 18:56:04

解决方案1
3 已采纳 2021-11-08 18:56:04