简体   繁体   English

按组将每周数据汇总为 pandas 中的每月总和

[英]Aggregating weekly data by group into monthly sums in pandas

This seems pretty straightforward to do but I'm very new to pandas and I'm not sure where to start.这看起来很简单,但我对 pandas 很陌生,我不知道从哪里开始。 I have a dataset that contains weekly data for multiple clinics.我有一个数据集,其中包含多个诊所的每周数据。 Every week begins on a Sunday and ends on a Saturday.每周从星期日开始,到星期六结束。 I'd like to aggregate it into monthly data and keep it sorted by clinic.我想将其汇总为每月数据并按诊所分类。

This is what it currently looks like:这是它目前的样子:

  In [2]: df
  Out[2]:
  Week                         Clinic Appointments Cancellations
  2021-11-28 to 2021-12-04     fee    40            4
  2021-11-28 to 2021-12-04     fi     21            2
  2021-12-05 to 2021-12-11     fee    36            3
  2022-02-20 to 2022-02-26     fee    10            1
  2022-02-27 to 2022-03-05     fee    45            3
  2022-02-27 to 2022-03-05     fi     30            1
  TOTAL (all clinics)          ---    182           14

And this is what I want it to become:这就是我希望它变成的样子:

  Month     Clinic Appointments  Cancellations
  Nov '21   fee     40           4
  Nov '21   fi      21           2
  Dec '21   fee     36           3
  Feb '22   fee     55           4
  Feb '22   fi      30           1
  TOTAL     ---     182          14

So the way that I would group a week with a month is if the beginning date (the Sunday) falls within that month.因此,我将一周与一个月分组的方式是,如果开始日期(星期日)在该月内。 Also, not all clinics will have data for every week.此外,并非所有诊所每周都有数据。

What I've tried:我试过的:

I've been trying to use我一直在尝试使用

df.groupby(['Clinic', 'Week']) df.groupby(['诊所', '周'])

but from there I'm not sure how to aggregate the sorted weekly data and return it as a new excel worksheet in the format I want.但从那里我不确定如何聚合排序的每周数据并将其作为我想要的格式的新 excel 工作表返回。 Any hints would be welcome.欢迎任何提示。

'Week' is not in the year_month format you need in your expected output, so you need to first convert them into year_month by: “周”不是您预期year_month中所需的年月格式,因此您需要先将它们转换为年月格式:

date = df['Week'].str.split(' ', expand=True)[0]
year_month = pd.to_datetime(date, errors='coerce').dt.strftime('%Y-%b').fillna(date)

before you use groupby :在使用groupby之前:

df.groupby([year_month, 'Clinic']).sum()

just to add to the above comment from Raymond, using:只是为了添加雷蒙德的上述评论,使用:

dt.strftime('%Y-%m')

instead of代替

dt.strftime('%Y-%b')

will sort correctly the output.将正确排序 output。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM