简体   繁体   English

如何使用 Pandas 计算两个日期之间的工作日数

[英]How to calculate the quantity of business days between two dates using Pandas

I created a pandas df with columns named start_date and current_date.我创建了一个带有名为 start_date 和 current_date 的列的 pandas df。 Both columns have a dtype of datetime64[ns].两列的数据类型均为 datetime64[ns]。 What's the best way to find the quantity of business days between the current_date and start_date column?查找 current_date 和 start_date 列之间的工作日数量的最佳方法是什么?

I've tried:我试过了:

from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay

us_bd = CustomBusinessDay(calendar=USFederalHolidayCalendar())

projects_df['start_date'] = pd.to_datetime(projects_df['start_date'])
projects_df['current_date'] = pd.to_datetime(projects_df['current_date'])

projects_df['days_count'] = len(pd.date_range(start=projects_df['start_date'], end=projects_df['current_date'], freq=us_bd))

I get the following error message:我收到以下错误消息:

Cannot convert input....start_date, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp

I'm using Python version 3.10.4.我正在使用 Python 3.10.4 版。

pd.date_range 's parameters need to be datetimes, not series. pd.date_range的参数需要是日期时间,而不是系列。
For this reason, we can use df.apply to apply the function to each row.出于这个原因,我们可以使用df.apply将函数应用于每一行。
In addition, pandas has bdate_range which is just date_range with freq defaulting to business days, which is exactly what you need.此外,pandas 有bdate_range ,它只是date_rangefreq默认为工作日,这正是您所需要的。
Using apply and a lambda function, we can create a new Series calculating business days between each start and current date for each row.使用 apply 和 lambda 函数,我们可以创建一个新的 Series,计算每行的每个开始日期和当前日期之间的工作日。

projects_df['start_date'] = pd.to_datetime(projects_df['start_date'])
projects_df['current_date'] = pd.to_datetime(projects_df['current_date'])

projects_df['days_count'] = projects_df.apply(lambda row: len(pd.bdate_range(row['start_date'], row['current_date'])), axis=1)

Using a random sample of 10 date pairs, my output is the following:使用 10 个日期对的随机样本,我的输出如下:

           start_date        current_date  bdays
0 2022-01-03 17:08:04 2022-05-20 00:53:46    100
1 2022-04-18 09:43:02 2022-06-10 16:56:16     40
2 2022-09-01 12:02:34 2022-09-25 14:59:29     17
3 2022-04-02 14:24:12 2022-04-24 21:05:55     15
4 2022-01-31 02:15:46 2022-07-02 16:16:02    110
5 2022-08-02 22:05:15 2022-08-17 17:25:10     12
6 2022-03-06 05:30:20 2022-07-04 08:43:00     86
7 2022-01-15 17:01:33 2022-08-09 21:48:41    147
8 2022-06-04 14:47:53 2022-12-12 18:05:58    136
9 2022-02-16 11:52:03 2022-10-18 01:30:58    175

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM