Hi, I am doing a time series analysis in python and want to split the dataset into train and test with the following split conditions:
For Train: Date <= 2018-01-05 in df1 and remove the Visit_Datetime if Weekly_Holiday == 1 in df2
For Test: All remaining in df1 except the Weekly_Holiday == 1 in df2
Please, suggest how to split.
df1:
Store_ID Visit_Datetime Visitors
0 ABC_1 01-01-2018 45
1 ABC_1 02-01-2018 60
2 ABC_1 03-01-2018 40
3 ABC_1 04-01-2018 80
4 ABC_1 05-01-2018 60
5 ABC_1 06-01-2018 50
6 ABC_1 07-01-2018 70
7 ABC_1 08-01-2018 30
8 ABC_1 09-01-2018 50
9 ABC_1 10-01-2018 60
df2:
Datetime Day Weekly_Holiday
0 01-01-2018 Monday 1
1 02-01-2018 Tuesday 0
2 03-01-2018 Wednesday 1
3 04-01-2018 Thursday 0
4 05-01-2018 Friday 0
5 06-01-2018 Saturday 1
6 07-01-2018 Sunday 0
I'm not quite sure, if I get it correctly, but you could try to use this approach.
# creating dictionaries and data frames
df1 = {'Store_ID': ['ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1'],
'Visit_Datetime':['01-01-2018', '02-01-2018', '03-01-2018', '04-01-2018', '05-01-2018', '06-01-2018','07-01-2018', '08-01-2018', '09-01-2018', '10-01-2018'],
'Visitors': [45, 60, 40, 80, 60, 50, 70, 30, 50, 60]}
df2 = {'Datetime': ['01-01-2018', '02-01-2018', '03-01-2018', '04-01-2018', '05-01-2018', '06-01-2018', '07-01-2018'],
'Day': ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'],
'Weekly_Holiday':[1,0,1,0,0,1,0]}
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)
# setting dates as datetime objects
df1['Visit_Datetime'] = pd.to_datetime(df1['Visit_Datetime'], format='%m-%d-%Y')
df2['Datetime'] = pd.to_datetime(df2['Datetime'], format='%m-%d-%Y')
# merging on dates
merged_df = pd.merge(df1,df2, how="left", left_on='Visit_Datetime', right_on='Datetime')
# splitting into train & test data frames
train_df = merged_df[merged_df['Visit_Datetime'] <= '2018-05-01']
train_df = train_df[target_df['Weekly_Holiday'] != 1]
test_df = merged_df[merged_df['Visit_Datetime'] > '2018-05-01']
test_df = test_df[test_df['Weekly_Holiday'] != 1]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.