简体   繁体   English

将列表中的日期与数据框中的日期范围进行比较

[英]Comparing dates in list with date ranges in dataframe

I'm having difficulty figuring out a way to count the occurrences of holidays between datetime ranges in a dataframe. 我很难找出一种方法来计算数据帧中日期时间范围之间的假日发生次数。 The holidays are in a list while the datetime ranges are in the dataframe as shown below: (note that this is a subset of a very large data set) 假期在列表中,而日期时间范围在数据框中,如下所示:(请注意,这是非常大的数据集的子集)

df = pd.DataFrame({'Date': ['2018-12-19 18:47','2019-01-01 06:11','2019-01-12 10:05','2019-02-17 14:22','2019-03-08 16:17','2019-03-25 17:35','2019-02-14 17:35'],
              'End Date': ['2018-12-28 18:47','2019-01-05 06:11','2019-01-16 10:05','2019-02-19 14:22','2019-03-12 16:17','2019-03-26 17:35','2019-05-27 17:35']})

df['Date'] = pd.to_datetime(df['Date'])
df['End Date'] = pd.to_datetime(df['End Date'])

Holidays = [date(2018,12,24),date(2018,12,25),date(2019,1,1),date(2019,1,21),date(2019,2,18),date(2019,3,8),date(2019,5,27)]

I've been able to find a way that determine whether or not a Holiday is within the datetime ranges, but not get an actual count. 我已经找到一种方法来确定“假日”是否在日期时间范围内,但无法获得实际计数。

Is there a way to alter the code below to gather the count rather than boolean values? 有没有一种方法可以更改下面的代码以收集计数而不是布尔值?

This is what I've tried so far: 到目前为止,这是我尝试过的:

df['Holidays'] = [any([(z>=x)&(z<=y) for z in Holidays]) for x , y in zip(df['Date'].dt.date,df['End Date'].dt.date)]

The result I'm looking for is as follows: 我正在寻找的结果如下:

result = pd.DataFrame({'Date': ['2018-12-19 18:47','2019-01-01 06:11','2019-01-12 10:05','2019-02-17 14:22','2019-03-08 16:17','2019-03-25 17:35','2019-02-14 17:35'],
                   'End Date': ['2018-12-28 18:47','2019-01-05 06:11','2019-01-16 10:05','2019-02-19 14:22','2019-03-12 16:17','2019-03-26 17:35','2019-05-27 17:35'],
                   'Holidays': [2,1,0,1,1,0,3]})

We can make a function that checks this condition and then apply it row-wise. 我们可以创建一个函数来检查这种情况,然后逐行apply

def fn(series):
    return sum([series.iloc[0] <= h <= series.iloc[1] for h in Holidays])

df.assign(Holidays=df.apply(fn, axis=1))

                 Date            End Date  Holidays
0 2018-12-19 18:47:00 2018-12-28 18:47:00         2
1 2019-01-01 06:11:00 2019-01-05 06:11:00         0
2 2019-01-12 10:05:00 2019-01-16 10:05:00         0
3 2019-02-17 14:22:00 2019-02-19 14:22:00         1
4 2019-03-08 16:17:00 2019-03-12 16:17:00         0
5 2019-03-25 17:35:00 2019-03-26 17:35:00         0
6 2019-02-14 17:35:00 2019-05-27 17:35:00         3

Your desired output is incorrect because the Holidays list has no hours for any of the date timestamps. 您期望的输出不正确,因为“ Holidays列表中没有任何日期时间戳记。 To get the output that you posted we will have to round down to the day. 为了获得您发布的输出,我们将四舍五入到一天。

def fn(series):
    return sum([series.iloc[0].floor('d') <= h <= series.iloc[1].floor('d') for h in Holidays])

df.assign(Holidays=df.apply(fn, axis=1))

               Date          End Date  Holidays
0  2018-12-19 18:47  2018-12-28 18:47         2
1  2019-01-01 06:11  2019-01-05 06:11         1
2  2019-01-12 10:05  2019-01-16 10:05         0
3  2019-02-17 14:22  2019-02-19 14:22         1
4  2019-03-08 16:17  2019-03-12 16:17         1
5  2019-03-25 17:35  2019-03-26 17:35         0
6  2019-02-14 17:35  2019-05-27 17:35         3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM