繁体   English   中英

将数据集拆分为训练并测试 python 中的时间序列分析

[英]Split data set into train and test for time series analysis in python

在此处输入图像描述

嗨,我正在 python 中进行时间序列分析,并希望将数据集拆分为训练并使用以下拆分条件进行测试:

  1. 我有 2 个数据集(df1 = 所有数据,df2 = 1 表示的假期,0 表示没有假期)
  2. 对于火车:df1 中的日期 <= 2018-01-05,如果 df2 中的 Weekly_Holiday == 1,则删除 Visit_Datetime

  3. 对于测试:除 Weekly_Holiday == df2 中的 df1 之外的所有剩余部分

请建议如何拆分。

df1:

  Store_ID Visit_Datetime  Visitors
0    ABC_1     01-01-2018        45
1    ABC_1     02-01-2018        60
2    ABC_1     03-01-2018        40
3    ABC_1     04-01-2018        80
4    ABC_1     05-01-2018        60
5    ABC_1     06-01-2018        50
6    ABC_1     07-01-2018        70
7    ABC_1     08-01-2018        30
8    ABC_1     09-01-2018        50
9    ABC_1     10-01-2018        60


df2:

     Datetime        Day  Weekly_Holiday
0  01-01-2018     Monday               1
1  02-01-2018    Tuesday               0
2  03-01-2018  Wednesday               1
3  04-01-2018   Thursday               0
4  05-01-2018     Friday               0
5  06-01-2018   Saturday               1
6  07-01-2018     Sunday               0

我不太确定,如果我理解正确,但你可以尝试使用这种方法。

# creating dictionaries and data frames 
df1 = {'Store_ID': ['ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1', 'ABC_1'],
       'Visit_Datetime':['01-01-2018', '02-01-2018', '03-01-2018', '04-01-2018', '05-01-2018', '06-01-2018','07-01-2018', '08-01-2018', '09-01-2018', '10-01-2018'],
       'Visitors': [45, 60, 40, 80, 60, 50, 70, 30, 50, 60]}

df2 = {'Datetime': ['01-01-2018', '02-01-2018', '03-01-2018', '04-01-2018', '05-01-2018', '06-01-2018', '07-01-2018'], 
       'Day': ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 
       'Weekly_Holiday':[1,0,1,0,0,1,0]}


df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)

# setting dates as datetime objects
df1['Visit_Datetime'] = pd.to_datetime(df1['Visit_Datetime'],  format='%m-%d-%Y')
df2['Datetime'] = pd.to_datetime(df2['Datetime'],  format='%m-%d-%Y')

# merging on dates
merged_df = pd.merge(df1,df2, how="left", left_on='Visit_Datetime', right_on='Datetime')

# splitting into train & test data frames
train_df = merged_df[merged_df['Visit_Datetime'] <= '2018-05-01']

train_df = train_df[target_df['Weekly_Holiday'] != 1]

test_df = merged_df[merged_df['Visit_Datetime'] > '2018-05-01']

test_df = test_df[test_df['Weekly_Holiday'] != 1]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM