简体   繁体   English

如何计算 pandas dataframe 中距离周末或休息日的天数

[英]How to calculate number of days until weekend or day off in pandas dataframe

I have pandas dataframe with a non-continuous date index (missing are weekends and holidays).我有 pandas dataframe 具有非连续日期索引(缺少周末和节假日)。 I want to add column which would contain number of days until next day off.我想添加包含直到第二天休息的天数的列。

Here is code generating example dataframe with desired values in till_day_off column:这是代码生成示例 dataframe,在till_day_off列中具有所需的值:

import pandas as pd
​
df = pd.DataFrame(index=pd.date_range(start="2022-06-06", periods=15))
df["day_of_week"] = df.index.dayofweek   # adding column with number of day in a week
df = df[(df.day_of_week < 5)]   # remove weekends
df = df.drop(index="2022-06-15")   # remove Wednesday in second week
df["till_day_off"] = [5,4,3,2,1,2,1,2,1,1] # desired values, end of column is treated as day off

Resulting dataframe:结果 dataframe:

day_of_week day_of_week till_day_off直到_day_off
2022-06-06 2022-06-06 0 0 5 5
2022-06-07 2022-06-07 1 1 4 4
2022-06-08 2022-06-08 2 2 3 3
2022-06-09 2022-06-09 3 3 2 2
2022-06-10 2022-06-10 4 4 1 1
2022-06-13 2022-06-13 0 0 2 2
2022-06-14 2022-06-14 1 1 1 1
2022-06-16 2022-06-16 3 3 2 2
2022-06-17 2022-06-17 4 4 1 1
2022-06-20 2022-06-20 0 0 1 1

Real dataframe has over 7_000 rows so obviously I am trying to avoid iteration over rows.真正的 dataframe 有超过 7_000 行,所以显然我试图避免对行进行迭代。 Any idea how to tackle the issue?知道如何解决这个问题吗?

Assuming a sorted input (if not, sort it by days), you can use a mask to identify consecutive days and use it to group them and compute a cumcount:假设一个已排序的输入(如果不是,则按天排序),您可以使用掩码来识别连续天,并使用它对它们进行分组并计算一个 cumcount:

mask = (-df.index.to_series().diff(-1)).eq('1d').iloc[::-1]
# reversing the Series to count until (not since) the value

df['till_day_off'] = mask.groupby((~mask).cumsum()).cumcount().add(1)

output: output:

            day_of_week  till_day_off
2022-06-06            0             5
2022-06-07            1             4
2022-06-08            2             3
2022-06-09            3             2
2022-06-10            4             1
2022-06-13            0             2
2022-06-14            1             1
2022-06-16            3             2
2022-06-17            4             1
2022-06-20            0             1

intermediates:中间体:

mask

2022-06-20    False
2022-06-17    False
2022-06-16     True
2022-06-14    False
2022-06-13     True
2022-06-10    False
2022-06-09     True
2022-06-08     True
2022-06-07     True
2022-06-06     True
dtype: bool

(~mask).cumsum()

2022-06-20    1
2022-06-17    2
2022-06-16    2
2022-06-14    3
2022-06-13    3
2022-06-10    4
2022-06-09    4
2022-06-08    4
2022-06-07    4
2022-06-06    4
dtype: int64

Create a DataFrame of the missing dates, then use a merge_asof to match with the closest one in the future and calculate the time until that day off.创建缺失日期的 DataFrame,然后使用merge_asof与未来最接近的日期匹配,并计算直到那天休息的时间。

Here I assume days off are just missing dates, but this extends to the case where you have an explicit list of dates you want to use.在这里,我假设休息日只是缺少日期,但这扩展到您有明确的要使用日期列表的情况。

import pandas as pd

# DataFrame of missing dates, e.g. days off.
df1 = pd.DataFrame({'day_off': pd.date_range(df.index.min(), df.index.max()+pd.offsets.DateOffset(days=1), freq='D')})
df1 = df1[~df1['day_off'].isin(df.index)]

df = pd.merge_asof(df, df1, left_index=True, right_on='day_off', direction='forward')
df['till_day_off'] = (df['day_off'] - df.index).dt.days

print(df)

            day_of_week    day_off  till_day_off
2022-06-06            0 2022-06-11             5
2022-06-07            1 2022-06-11             4
2022-06-08            2 2022-06-11             3
2022-06-09            3 2022-06-11             2
2022-06-10            4 2022-06-11             1
2022-06-13            0 2022-06-15             2
2022-06-14            1 2022-06-15             1
2022-06-16            3 2022-06-18             2
2022-06-17            4 2022-06-18             1
2022-06-20            0 2022-06-21             1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Python Pandas 列表中的日期计算距最近日期和自日期以来的天数? - How to calculate number of days until the nearest and since the datest date from dates in list in Python Pandas? 计算从今天起5天的日期,在接下来的5天(周末)中的每一天增加一天 - Calculate date 5 days from today, adding an extra day for each day in the next 5 days that is a weekend day 计算第一个值出现在熊猫数据框中的天数 - Calculate number ofdays until first value appears in pandas dataframe 如何计算 pandas dataframe 中的日差 - How do I calculate day on day difference in a pandas dataframe 如何计算从日期时间到今天的天数? - How to calculate the number of days from a datetime until today? 如何在pandas数据框中找到不同日期的小时之间的差异? - How to find the difference between hour of day of separate days in a pandas dataframe? 计算假期天数 pandas - Calculate number of days to holidays pandas 如何计算这个df Python Pandas阶段之间的天数? - How to calculate the number of days between stages in this df Python Pandas? 熊猫数据框计算工作日数 - Pandas Dataframe Calculate Num Business Days 如何在满足 x 天之前向 pandas datetime 添加天数,同时保持前一个日期? - How to add days to pandas datetime until x day is met, while keeping the previous date?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM