简体   繁体   English

Pandas:将日期列切割成期间日期组/箱

[英]Pandas: cut date column into period date groups/bins

I have a dataframe as below:我有一个 dataframe 如下:

df = pd.DataFrame({'Id': ['abs1', 'abs2', 'abs3', 'plo2', '201805', '201806', '202011', 'pctx1'],
                   'Date': ['2021-06-15', '2021-06-13', '2021-06-07', '2021-05-30',
                            '2021-05-12', '2021-04-28', '2021-04-15', '2021-02-01']})

I wish to bin the Date column into several groups in a new column, called Date_Bin, the rule is: from today's date, if the value in the Date is less than 7 days, then the value in the new column will be 'last 7 days', if the value is less than 14 days and more than 7 days from today, the value is '7 to 14 days', if the value is less than 30 days and more than 14 days, then the value is '14 to 30 days', same logic for 30 to 60 days, 60 to 90 days, and more than 90 days.我希望将Date列分成几组在一个新列中,称为Date_Bin,规则是:从今天开始,如果Date中的值小于7天,那么新列中的值将是'last 7 days',如果该值小于 14 天且大于 7 天,则该值为“7 到 14 天”,如果该值小于 30 天且大于 14 天,则该值为“14 到30 天,30 到 60 天、60 到 90 天和 90 天以上的逻辑相同。 The ideal output is like this:理想的output是这样的:

       Id        Date           Date_Bin
0    abs1  2021-06-15        last 7 days
1    abs2  2021-06-13        last 7 days
2    abs3  2021-06-07       7 to 14 days
3    plo2  2021-05-30      14 to 30 days
4  201805  2021-05-10      30 to 60 days
5  201806  2021-04-28      30 to 60 days
6  202011  2021-04-15      60 to 90 days
7   pctx1  2021-02-01  more than 90 days

As you can see the output, those are the only groups/bins I need for the data.正如您所看到的 output,这些是我需要的数据的唯一组/箱。 I tried a couple of ways and did not work, so much appreciate it if anyone can help please.我尝试了几种方法,但都没有奏效,如果有人能提供帮助,我将不胜感激。

Convert your dates with to_datetime then subtract from today's normalized date (so that we remove the time part) and get the number of days.使用to_datetime转换您的日期,然后从今天的normalized日期中减去(以便我们删除时间部分)并获得天数。 Then use pd.cut to group them appropriately.然后使用pd.cut对它们进行适当的分组。

Anything in the future gets labeled with NaN .未来的任何东西都会被标记为NaN

import pandas as pd
import numpy as np

df['Date'] = pd.to_datetime(df['Date'])
s =  (pd.to_datetime('today').normalize() - df['Date']).dt.days

df['Date_Bin'] = pd.cut(s, [0, 7, 14, 30, 60, 90, np.inf],
                        labels=['last 7 days', '7 to 14 days', '14 to 30 days',
                                '30 to 60 days', '60 to 90 days', 'more than 90 days'],
                        include_lowest=True)

print(df)

       Id       Date           Date_Bin
0    abs1 2021-06-15        last 7 days
1    abs2 2021-06-13        last 7 days
2    abs3 2021-06-07       7 to 14 days
3    plo2 2021-05-30      14 to 30 days
4  201805 2021-05-12      30 to 60 days
5  201806 2021-04-28      30 to 60 days
6  202011 2021-04-15      60 to 90 days
7   pctx1 2021-02-01  more than 90 days

For future reproducibility, at the time of writing:为了将来的可重复性,在撰写本文时:

pd.to_datetime('today').normalize()
#Timestamp('2021-06-15 00:00:00')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM