[英]Split data set into train and test for time series analysis in python
嗨,我正在 python 中进行时间序列分析,并希望将数据集拆分为训练并使用以下拆分条件进行测试:
对于火车:df1 中的日期 <= 2018-01-05,如果 df2 中的 Weekly_Holiday == 1,则删除 Visit_Datetime
对于测试:除 Weekly_Holiday == df2 中的 df1 之外的所有剩余部分
请建议如何拆分。
df1:
Store_ID Visit_Datetime Visitors
0 ABC_1 01-01-2018 45
1 ABC_1 02-01-2018 60
2 ABC_1 03-01-2018 40
3 ABC_1 04-01-2018 80
4 ABC_1 05-01-2018 60
5 ABC_1 06-01-2018 50
6 ABC_1 07-01-2018 70
7 ABC_1 08-01-2018 30
8 ABC_1 09-01-2018 50
9 ABC_1 10-01-2018 60
df2:
Datetime Day Weekly_Holiday
0 01-01-2018 Monday 1
1 02-01-2018 Tuesday 0
2 03-01-2018 Wednesday 1
3 04-01-2018 Thursday 0
4 05-01-2018 Friday 0
5 06-01-2018 Saturday 1
6 07-01-2018 Sunday 0
我不太确定,如果我理解正确,但你可以尝试使用这种方法。
# creating dictionaries and data frames
df1 = {'Store_ID': ['ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1'],
'Visit_Datetime':['01-01-2018', '02-01-2018', '03-01-2018', '04-01-2018', '05-01-2018', '06-01-2018','07-01-2018', '08-01-2018', '09-01-2018', '10-01-2018'],
'Visitors': [45, 60, 40, 80, 60, 50, 70, 30, 50, 60]}
df2 = {'Datetime': ['01-01-2018', '02-01-2018', '03-01-2018', '04-01-2018', '05-01-2018', '06-01-2018', '07-01-2018'],
'Day': ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'],
'Weekly_Holiday':[1,0,1,0,0,1,0]}
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)
# setting dates as datetime objects
df1['Visit_Datetime'] = pd.to_datetime(df1['Visit_Datetime'], format='%m-%d-%Y')
df2['Datetime'] = pd.to_datetime(df2['Datetime'], format='%m-%d-%Y')
# merging on dates
merged_df = pd.merge(df1,df2, how="left", left_on='Visit_Datetime', right_on='Datetime')
# splitting into train & test data frames
train_df = merged_df[merged_df['Visit_Datetime'] <= '2018-05-01']
train_df = train_df[target_df['Weekly_Holiday'] != 1]
test_df = merged_df[merged_df['Visit_Datetime'] > '2018-05-01']
test_df = test_df[test_df['Weekly_Holiday'] != 1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.