简体   繁体   English

如何使用Pandas的DataFrame统计一个月内每天的病例数?

[英]How to count the daily number of cases within a month by using Pandas' DataFrame?

I would like to count the number of daily cases within a certain month as follows:我想统计某个月内每天的病例数如下:

import pandas as pd

d1 = pd.DataFrame({'ID': ["A", "A", "A", "B", "B", "C", "C", "C", "C", "D", "D", "D"],
                   "date": ["2010-12-30", "2010-02-27", "2010-02-26", "2012-01-01", "2012-01-03",
                            "2011-01-01", "2011-01-02", "2011-01-08", "2014-02-21", "2010-08-31", "2010-08-30", "2010-09-01"]})

and the final outcome would be like this:最后的结果是这样的:

  ID year_month  count
0  A    2010-02      2
1  A    2010-12      1
2  B    2012-01      2
3  C    2011-01      3
4  C    2014-02      1
5  D    2010-08      2
6  D    2010-09      1

Do you have any ideas about how to produce the DataFrame like the above?您对如何生成上述 DataFrame 有什么想法吗? I used groupby and apply functions but could not produce like that.我使用了groupbyapply了函数,但无法产生那样的结果。 Thanks in advance!提前致谢!

Use Series.dt.to_period for month periods and count by GroupBy.size :Series.dt.to_period用于月期间并按GroupBy.size计数:

#convert to datetimes if necessary
#d1['date'] = pd.to_datetime(d1['date'])

df = (d1.groupby(['ID', d1['date'].dt.to_period('m').rename('year_month')])
        .size()
        .reset_index(name='count'))
print (df)
  ID year_month  count
0  A    2010-02      2
1  A    2010-12      1
2  B    2012-01      2
3  C    2011-01      3
4  C    2014-02      1
5  D    2010-08      2
6  D    2010-09      1

Another idea with Series.dt.strftime : Series.dt.strftime的另一个想法:

#convert to datetimes if necessary
#d1['date'] = pd.to_datetime(d1['date'])

df = (d1.groupby(['ID', d1['date'].dt.strftime('%Y-%m').rename('year_month')])
        .size()
        .reset_index(name='count'))
print (df)
  ID year_month  count
0  A    2010-02      2
1  A    2010-12      1
2  B    2012-01      2
3  C    2011-01      3
4  C    2014-02      1
5  D    2010-08      2
6  D    2010-09      1

If no datetimes, but strings:如果没有日期时间,但字符串:

df = (d1.groupby(['ID', d1['date'].str[:7].rename('year_month')])
        .size()
        .reset_index(name='count'))
print (df)
  ID year_month  count
0  A    2010-02      2
1  A    2010-12      1
2  B    2012-01      2
3  C    2011-01      3
4  C    2014-02      1
5  D    2010-08      2
6  D    2010-09      1

Using apply and groupby should work:使用applygroupby应该有效:

import pandas as pd

d1 = pd.DataFrame({'ID': ["A", "A", "A", "B", "B", "C", "C", "C", "C", "D", "D", "D"],
                   "date": ["2010-12-30", "2010-02-27", "2010-02-26", "2012-01-01", "2012-01-03",
                            "2011-01-01", "2011-01-02", "2011-01-08", "2014-02-21", "2010-08-31", "2010-08-30", "2010-09-01"]})

d1["month_year"] = d1.apply(lambda row: row["date"][:7])
month_year = d1.groupby("month_year").size().reset_index(name="count")

print(month_year)

This will result:这将导致:

  month_year  count
0    2010-02      2
1    2010-08      2
2    2010-09      1
3    2010-12      1
4    2011-01      3
5    2012-01      2
6    2014-02      1

You will probably want to change the apply lambda to handle the date more carefully.您可能希望更改apply lambda 以更仔细地处理日期。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM