简体   繁体   中英

Split data set into train and test for time series analysis in python

在此处输入图像描述

Hi, I am doing a time series analysis in python and want to split the dataset into train and test with the following split conditions:

  1. I have 2 dataset (df1 = all the data, df2 = holidays represented by 1 and 0 represents no holiday)
  2. For Train: Date <= 2018-01-05 in df1 and remove the Visit_Datetime if Weekly_Holiday == 1 in df2

  3. For Test: All remaining in df1 except the Weekly_Holiday == 1 in df2

Please, suggest how to split.

df1:

  Store_ID Visit_Datetime  Visitors
0    ABC_1     01-01-2018        45
1    ABC_1     02-01-2018        60
2    ABC_1     03-01-2018        40
3    ABC_1     04-01-2018        80
4    ABC_1     05-01-2018        60
5    ABC_1     06-01-2018        50
6    ABC_1     07-01-2018        70
7    ABC_1     08-01-2018        30
8    ABC_1     09-01-2018        50
9    ABC_1     10-01-2018        60


df2:

     Datetime        Day  Weekly_Holiday
0  01-01-2018     Monday               1
1  02-01-2018    Tuesday               0
2  03-01-2018  Wednesday               1
3  04-01-2018   Thursday               0
4  05-01-2018     Friday               0
5  06-01-2018   Saturday               1
6  07-01-2018     Sunday               0

I'm not quite sure, if I get it correctly, but you could try to use this approach.

# creating dictionaries and data frames 
df1 = {'Store_ID': ['ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1'],
       'Visit_Datetime':['01-01-2018', '02-01-2018', '03-01-2018', '04-01-2018', '05-01-2018', '06-01-2018','07-01-2018', '08-01-2018', '09-01-2018', '10-01-2018'],
       'Visitors': [45, 60, 40, 80, 60, 50, 70, 30, 50, 60]}

df2 = {'Datetime': ['01-01-2018', '02-01-2018', '03-01-2018', '04-01-2018', '05-01-2018', '06-01-2018', '07-01-2018'], 
       'Day': ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 
       'Weekly_Holiday':[1,0,1,0,0,1,0]}


df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)

# setting dates as datetime objects
df1['Visit_Datetime'] = pd.to_datetime(df1['Visit_Datetime'],  format='%m-%d-%Y')
df2['Datetime'] = pd.to_datetime(df2['Datetime'],  format='%m-%d-%Y')

# merging on dates
merged_df = pd.merge(df1,df2, how="left", left_on='Visit_Datetime', right_on='Datetime')

# splitting into train & test data frames
train_df = merged_df[merged_df['Visit_Datetime'] <= '2018-05-01']

train_df = train_df[target_df['Weekly_Holiday'] != 1]

test_df = merged_df[merged_df['Visit_Datetime'] > '2018-05-01']

test_df = test_df[test_df['Weekly_Holiday'] != 1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM