简体   繁体   English

时间序列数据集训练测试拆分 ML

[英]time series dataset train test split ML

On machinelearningmastery there is a post about how to create a supervised learning regression type dataset from one time series variable.machinelearningmastery 上,有一篇关于如何从一个时间序列变量创建监督学习回归类型数据集的帖子

For example this:例如这个:

time, measure
1, 100
2, 110
3, 108
4, 115
5, 120

Can be turned into this below after passing the data through a function series_to_supervised通过函数series_to_supervised传递数据后可以变成下面这个

X, y
?, 100
100, 110
110, 108
108, 115
115, 120
120, ?

In the Multi-Step or Sequence Forecasting section of the machinelearningmastery post, the series_to_supervised can output this below:series_to_supervised帖子的多步或序列预测部分, series_to_supervised可以输出如下:

   var1(t-2)  var1(t-1)  var1(t)  var1(t+1)
2        0.0        1.0        2        3.0
3        1.0        2.0        3        4.0
4        2.0        3.0        4        5.0
5        3.0        4.0        5        6.0
6        4.0        5.0        6        7.0
7        5.0        6.0        7        8.0
8        6.0        7.0        8        9.0

My question is how would I define the X & y train test split?我的问题是我将如何定义 X & y 列车测试拆分? I am assuming the var1(t) would be the defined as y, right?我假设var1(t)将被定义为 y,对吗? For example would this be correct below for trainX & trainy?例如,这对于 trainX 和 trainy 是否正确? I am experimenting with我正在试验

#function for time series X,y breakdown
train = series_to_supervised(need_to_train,11,14)

#split data sets
trainX = np.array(train.drop(['var1(t)'],1))
trainy = np.array(train['var1(t)'])

model = XGBRegressor(objective='reg:squarederror', n_estimators=100)

No, var1(t+1) would be the target and taken as y .不, var1(t+1)将是目标并被视为y The whole point is to predict the next step in the future from the current (and past) data.重点是根据当前(和过去)的数据预测未来的下一步。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM