[英]Is there a quick way for checking whether a date lies within n days(say 7) from a list of dates
我正在使用以下數據集:
日期 |
---|
2016-01-04 |
2016-01-05 |
2016-01-06 |
2016-01-07 |
2016-01-08 |
和列表holidays = ['2016-01-01','2016-01-18'....'2017-11-23','2017-12-25']
目標:創建一個列,指示特定日期是否在列表中任何假期的 +- 7 天內。
模擬 output:
日期 | 假期一周內 |
---|---|
2016-01-04 | 1 |
2016-01-05 | 1 |
2016-01-06 | 1 |
2016-01-07 | 1 |
2016-01-08 | 0 |
我正在處理大量日期記錄,因此試圖找到一種快速(最優化)的方法來做到這一點。
我目前的解決方案:
我想快速做到這一點的一種方法是創建另一個列表,其中只有我想要的持續時間(比如 2 年)的唯一日期。 這樣,我可以使用2 個 for 循環實現一個簡單的解決方案,以檢查日期是否在假期的 +-7 天內,並且計算量不會很大,因為兩個列表都相對較小(730 個唯一日期和約 20 個日期在假期列表中)。 一旦我有了我想要的日期列表,我所要做的就是在我的“日期”列上運行一次檢查,看看那個日期是否是我創建的這個新列表的一部分。 但是,有什么建議可以更快地做到這一點嗎?
嘗試這個:
樣本:
import pandas as pd
df = pd.DataFrame({'Date': {0: '2016-01-04',
1: '2016-01-05',
2: '2016-01-06',
3: '2016-01-07',
4: '2016-01-08'}})
代碼:
def get_date_range(holidays):
h = [pd.to_datetime(x) for x in holidays]
h = [pd.date_range(x - pd.DateOffset(6), x + pd.DateOffset(6)) for x in h]
h = [x.strftime('%Y-%m-%d') for y in h for x in y]
return h
df['Within a week of Holiday'] = df['Date'].isin(get_date_range(holidays))*1
結果:
Out[141]:
0 1
1 1
2 1
3 1
4 0
Name: Within a week of Holiday, dtype: int32
制作一個function
以+- 7
天計算日期並檢查計算日期是否在假期中,因此返回True
否則False
並將 function 應用於Data frame
import datetime
import pandas as pd
holidays = ['2016-01-01','2016-01-18','2017-11-23','2017-12-25']
def holiday_present(date):
date = datetime.datetime.strptime(date, '%Y-%m-%d')
for i in range(-7,7):
datte = (date - datetime.timedelta(days=i)).strftime('%Y-%m-%d')
if datte in holidays:
return True
return False
data = {
"Date":[
"2016-01-04",
"2016-01-05",
"2016-01-06",
"2016-01-07",
"2016-01-08"]
}
df= pd.DataFrame(data)
df["Within a week of Holiday"] = df["Date"].apply(holiday_present).astype(int)
Output:
Date Within a week of Holiday
0 2016-01-04 1
1 2016-01-05 1
2 2016-01-06 1
3 2016-01-07 1
4 2016-01-08 0
把假期變成 DataFrame 然后merge_asof
容差為 6 天:
new_df = pd.merge_asof(df, holidays, left_on='Date', right_on='Holiday',
tolerance=pd.Timedelta(days=6))
new_df['Holiday'] = np.where(new_df['Holiday'].notnull(), 1, 0)
new_df = new_df.rename(columns={'Holiday': 'Within a week of Holiday'})
完整的工作示例:
import numpy as np
import pandas as pd
holidays = pd.DataFrame(pd.to_datetime(['2016-01-01', '2016-01-18']),
columns=['Holiday'])
df = pd.DataFrame({
'Date': ['2016-01-04', '2016-01-05', '2016-01-06', '2016-01-07',
'2016-01-08']
})
df['Date'] = pd.to_datetime(df['Date'])
new_df = pd.merge_asof(df, holidays, left_on='Date', right_on='Holiday',
tolerance=pd.Timedelta(days=6))
new_df['Holiday'] = np.where(new_df['Holiday'].notnull(), 1, 0)
new_df = new_df.rename(columns={'Holiday': 'Within a week of Holiday'})
print(new_df)
new_df
:
Date Within a week of Holiday
0 2016-01-04 1
1 2016-01-05 1
2 2016-01-06 1
3 2016-01-07 1
4 2016-01-08 0
或者將 Holdiays 轉換為 np datetime 數組,然后在“日期”列中broadcast
減法,將abs
與 7 天進行比較,看看是否any
匹配項:
holidays = np.array(['2016-01-01', '2016-01-18']).astype('datetime64')
df['Within a week of Holiday'] = (
abs(df['Date'].values - holidays[:, None]) < pd.Timedelta(days=7)
).any(axis=0).astype(int)
完整的工作示例:
import numpy as np
import pandas as pd
holidays = np.array(['2016-01-01', '2016-01-18']).astype('datetime64')
df = pd.DataFrame({
'Date': ['2016-01-04', '2016-01-05', '2016-01-06', '2016-01-07',
'2016-01-08']
})
df['Date'] = pd.to_datetime(df['Date'])
df['Within a week of Holiday'] = (
abs(df['Date'].values - holidays[:, None]) < pd.Timedelta(days=7)
).any(axis=0).astype(int)
print(df)
df
:
Date Within a week of Holiday
0 2016-01-04 1
1 2016-01-05 1
2 2016-01-06 1
3 2016-01-07 1
4 2016-01-08 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.