[英]How to calculate number of days until the nearest and since the datest date from dates in list in Python Pandas?
[英]How to calculate number of days until weekend or day off in pandas dataframe
我有 pandas dataframe 具有非连续日期索引(缺少周末和节假日)。 我想添加包含直到第二天休息的天数的列。
这是代码生成示例 dataframe,在till_day_off列中具有所需的值:
import pandas as pd
df = pd.DataFrame(index=pd.date_range(start="2022-06-06", periods=15))
df["day_of_week"] = df.index.dayofweek # adding column with number of day in a week
df = df[(df.day_of_week < 5)] # remove weekends
df = df.drop(index="2022-06-15") # remove Wednesday in second week
df["till_day_off"] = [5,4,3,2,1,2,1,2,1,1] # desired values, end of column is treated as day off
结果 dataframe:
day_of_week | 直到_day_off | |
---|---|---|
2022-06-06 | 0 | 5 |
2022-06-07 | 1 | 4 |
2022-06-08 | 2 | 3 |
2022-06-09 | 3 | 2 |
2022-06-10 | 4 | 1 |
2022-06-13 | 0 | 2 |
2022-06-14 | 1 | 1 |
2022-06-16 | 3 | 2 |
2022-06-17 | 4 | 1 |
2022-06-20 | 0 | 1 |
真正的 dataframe 有超过 7_000 行,所以显然我试图避免对行进行迭代。 知道如何解决这个问题吗?
假设一个已排序的输入(如果不是,则按天排序),您可以使用掩码来识别连续天,并使用它对它们进行分组并计算一个 cumcount:
mask = (-df.index.to_series().diff(-1)).eq('1d').iloc[::-1]
# reversing the Series to count until (not since) the value
df['till_day_off'] = mask.groupby((~mask).cumsum()).cumcount().add(1)
output:
day_of_week till_day_off
2022-06-06 0 5
2022-06-07 1 4
2022-06-08 2 3
2022-06-09 3 2
2022-06-10 4 1
2022-06-13 0 2
2022-06-14 1 1
2022-06-16 3 2
2022-06-17 4 1
2022-06-20 0 1
中间体:
mask
2022-06-20 False
2022-06-17 False
2022-06-16 True
2022-06-14 False
2022-06-13 True
2022-06-10 False
2022-06-09 True
2022-06-08 True
2022-06-07 True
2022-06-06 True
dtype: bool
(~mask).cumsum()
2022-06-20 1
2022-06-17 2
2022-06-16 2
2022-06-14 3
2022-06-13 3
2022-06-10 4
2022-06-09 4
2022-06-08 4
2022-06-07 4
2022-06-06 4
dtype: int64
创建缺失日期的 DataFrame,然后使用merge_asof
与未来最接近的日期匹配,并计算直到那天休息的时间。
在这里,我假设休息日只是缺少日期,但这扩展到您有明确的要使用日期列表的情况。
import pandas as pd
# DataFrame of missing dates, e.g. days off.
df1 = pd.DataFrame({'day_off': pd.date_range(df.index.min(), df.index.max()+pd.offsets.DateOffset(days=1), freq='D')})
df1 = df1[~df1['day_off'].isin(df.index)]
df = pd.merge_asof(df, df1, left_index=True, right_on='day_off', direction='forward')
df['till_day_off'] = (df['day_off'] - df.index).dt.days
print(df)
day_of_week day_off till_day_off
2022-06-06 0 2022-06-11 5
2022-06-07 1 2022-06-11 4
2022-06-08 2 2022-06-11 3
2022-06-09 3 2022-06-11 2
2022-06-10 4 2022-06-11 1
2022-06-13 0 2022-06-15 2
2022-06-14 1 2022-06-15 1
2022-06-16 3 2022-06-18 2
2022-06-17 4 2022-06-18 1
2022-06-20 0 2022-06-21 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.