简体   繁体   中英

Count number days between two dates, not counting weekends and holidays

I've got these dates, 04/02/2020 and 30/06/2020, and I want to check how many days are between them skipping specified dates like 25th December or 1st May, as well as weekends.

For example, between the above two dates are 147 days (the end date doesn't count in this), but there are 21 weekends in between those dates so there are only 105 workdays. And if Friday 1st May is a holiday, then the final answer would be 104 workdays.

I've done the following to skip weekends, but I'm still lost on how to skip holidays; is there a way to create a sort of 'blacklist' so if the difference passes through any day in that list it subtracts one day. At first I though of using a dictionary, but I don't know how that would work.

This is the weekend 'fix':

import math
from datetime import datetime

date_input = '4/2/2020'
date_end = '30/6/2020'
start = datetime.strptime(date_input, "%d-%m-%Y").date()
end = datetime.strptime(date_end, "%d-%m-%Y").date()

Gap = (end - start).days
N_weeks = Gap / 7
weekends = (math.trunc(N_weeks)) * 2

final_result = str((Gap) - weekends)

How can I remove holiday dates from this count?

If you have a list of dates that should be skipped, then you can test if any of them fall within the range of your start and end dates. date objects are orderable, so you can use:

# list of holiday dates
dates_to_skip = [date(2020, 5, 1), date(2020, 12, 25)]

skip_count = 0
for to_skip in dates_to_skip:
    if start <= to_skip < end:
        skip_count += 1

The start <= to_skip < end chained comparison is only true if the to_skip date falls between the two values. For your example dates, that would only be the case for May 1st:

>>> from datetime import date
>>> start = date(2020, 2, 4)
>>> end = date(2020, 6, 30)
>>> dates_to_skip = [date(2020, 5, 1), date(2020, 12, 25)]
>>> for to_skip in dates_to_skip:
...     if start <= to_skip < end:
...         print(f"{to_skip} falls between {start} and {end}")
...
2020-05-01 falls between 2020-02-04 and 2020-06-30

If your list of dates to skip is large , the above might take too long to process, testing each and every date in a list individually is not that efficient really.

In that case you want to use bisection to quickly determine the number of matching dates between start and end , by making sure the list of dates to skip is kept in sorted order , then using the bisect module to find the indexes of where you'd insert start and end ; the difference between those two indexes is the number of matching dates you want to subtract from your range count:

from bisect import bisect_left

def count_skipped(start, end, dates_to_skip):
    """Count how many dates in dates_to_skip fall between start and end

    start is inclusive, end is exclusive

    """
    if start >= end:
        return 0
    start_idx = bisect_left(dates_to_skip, start)
    end_idx = bisect_left(dates_to_skip, end, lo=start_idx)
    return end_idx - start_idx

Note that bisect.bisect_left() gives you the index at which all values in dates_to_skip[start_idx:] are equal or higher to the start date. For the end date, all values in dates_to_skip[:end_idx] are going to be lower ( dates_to_skip[end_idx] itself could be equal to end , but end is excluded). And once you know the index for the start date, when searching for the index for the end date, we can tell bisect_left() to skip all values up to start_idx as the end date is going to be higher than any start value (although the value at dates_to_skip[start_idx] could be higher than both start and end). The difference between those two bisect_left() results is the number of dates that fall between the start and end.

The advantage of using bisect is that it takes O(logN) steps to count how many dates out of a list of N dates fall between start and end , while the simplistic for to_skip in dates_to_skip: loop above, takes O(N) steps. That doesn't matter if there are 5 or 10 dates to test, but if you have 1 thousand dates then it starts to matter that the bisect method only needs 10 steps, not 1 thousand.

Note that your weekend-counting calculation is not correct , it is too simplistic. Here is an example that shows that the number of weekend dates differs for two different periods of 11 days; your approach would count 2 weekend days for either example:

Say your start date is a Monday, and your end date is the Friday one week further, you have just 1 weekend in between and so have 11 - 2 = 9 weekdays (not counting the end date):

| M   | T | W | T | F   |  S  |  S  |
|-----|---|---|---|-----|---- |-----|
| [1] | 2 | 3 | 4 |  5  | _1_ | _2_ |
|  6  | 7 | 8 | 9 | (E) |     |     |

In the above table, [1] is the start date, (E) is the end date, and the numbers count the work days; the skipped weekend days are counted with _1_ , _2_ numbers.

But if the start day is a Friday, and the end day is the Tuesday in the second week following, then you have the same number of whole days between start and end, but now you have to subtract two weekends; there are only 7 workdays between those two days:

| M | T   | W | T | F   |  S  |  S  |
|---|-----|---|---|-----|-----|-----|
|   |     |   |   | [1] | _1_ | _2_ |
| 2 |  3  | 4 | 5 |  6  | _3_ | _4_ |
| 7 | (E) |   |   |     |     |     |

So counting the number of days between the start and end and then dividing that number by 7 is not the correct way to count weeks or weekends here. To count whole weekends, find the nearest Saturdays (going forward) from both the start and end date, so you end up with two dates that are a multiple of 7 days apart. Dividing that number by 7 will give you the actual number of whole weekends between the two days. Then adjust that number if either the start or end date fall on a Sunday before moving (when starting on a Sunday, add one to the total, for the end date being a Sunday, subtract one day from the total).

You can find the nearest Saturday from any given date, by taking the date.weekday() value , then subtracting that from 5, and taking that value modulus 7 as the number of days to add. This will always give you the right value for any given day of the week; for weekend days (0 - 4) 5 - date.weekday() is the positive number of days to skip to get to Saturday, for Saturday (5) the result is 0 (no days to skip), and for Sunday (6), 5 - 6 is -1 , but the % 7 modulus operation turns that into (7 - 1) so 6 days.

The following function implements these tricks to get you the right number of weekend days between any two dates start and end , where start is lower than end :

from datetime import timedelta

def count_weekend_days(start, end):
    """Count the number of weekend days (Saturday, Sunday)

    Start is inclusive, end is exclusive.

    """
    if start >= end:
        return 0

    # If either start or end are a Sunday, count these manually
    # Boolean results have either a 0 (false) or 1 (true) integer
    # value, so we can do arithmetic with these:
    boundary_sundays = (start.weekday() == 6) - (end.weekday() == 6)

    # find the nearest Saturday from the start and end, going forward
    start += timedelta(days=(5 - start.weekday()) % 7)
    end += timedelta(days=(5 - end.weekday()) % 7)

    # start and end are Saturdays, the difference between
    # these days is going to be a whole multiple of 7.
    # Floor division by 7 gives the number of whole weekends
    weekends = (end - start).days // 7
    return boundary_sundays + (weekends * 2)

The adjustment logic may need a bit more explaining. Moving both boundaries forward instead of moving the start forwards and the end backwards in time, is much easier to handle; there are no other adjustments in counts needed, while at the same time making it trivial to count whole weekends between the two dates.

If both start and end are weekdays (their date.weekday() method result is a value between 0 and 4) then moving either to the next Saturday will keep the same number of whole weekends between the two dates, no matter what weekday they started at. Moving dates forward this way doesn't skew the weekend day count, but does make it much easier to get a correct number.

If start falls on a Sunday, moving forward to the next Saturday would need to account for this skipped Sunday separately; it's a half weekend you'd want to include in the result so you want to add 1 to the total. If end falls on a Sunday, then that day shouldn't count in the total (the end date is exclusive in the range), but moving to the next Saturday would include it in the count, so you want to subtract this extra weekend day.

In the code above I simply use two boolean tests with subtraction to do the initial boundary_sundays value calculation. In Python the bool type is a subclass of int and False and True have integer values. Subtracting two booleans gives you an integer value. boundary_sundays is going to be -1 , 0 or 1 , depending on how many Sundays we find.

Putting these together:

def count_workdays(start, end, holidays):
    """Count the number of workdays between start and end.

    Workdays are dates that fall on Monday through to Friday.

    start and end are datetime.date objects. holidays is a sorted
    list of date objects that should *not* count as workdays; it is assumed
    that all dates in this list fall on Monday through to Friday;
    if there are any weekend days in this list the workday count
    may be incorrect as weekend days will be subtracted more than once.

    Start is inclusive, end exclusive.

    """
    if start >= end:
        return 0
    count = (end - start).days
    count -= count_skipped(start, end, holidays)
    count -= count_weekend_days(start, end)

    return count

Demo:

>>> start = date(2020, 2, 4)
>>> end = date(2020, 6, 30)
>>> holidays = [date(2020, 5, 1), date(2020, 12, 25]  # in sorted order
>>> count_workdays(start, end, holidays)
104

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM