简体   繁体   English

用于 LSTM 的时间序列数据的训练测试拆分

[英]Train-Test split for Time Series Data to be used for LSTM

values = df.values
train, test = train_test_split(values)

#Split into train and test
X_train, y_train = train[:, :-1], train[:, -1]
X_test, y_test = test[:, :-1], test[:, -1]

Executing the above code splits the time series dataset into training- 75% and testing 25%.执行上述代码将时间序列数据集拆分为训练 75% 和测试 25%。 I want to control the train-test split as 80-20 or 90-10.我想将训练测试拆分控制为 80-20 或 90-10。 Can someone please help me understand how to split the dataset into any ratio I want?有人可以帮助我了解如何将数据集拆分为我想要的任何比例吗?

The concept is borrowed from https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/ .这个概念是从https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/借来的。

Note : I cannot split the dataset randomly for train and test and the most recent values have to be for testing .注意:我不能为训练和测试随机拆分数据集,最近的值必须用于测试 I have included a screenshot of my dataset.我已经包含了我的数据集的屏幕截图。

在此处输入图片说明 If anyone can interpret the code, please do help me understand the above.如果有人可以解释代码,请帮助我理解上述内容。 Thanks.谢谢。

Here's the documentation. 这是文档。

Basically, you'll want to do something like train_test_split(values,test_size=.2,shuffle=False)基本上,你会想要做类似train_test_split(values,test_size=.2,shuffle=False)

test_size=.2 tells the function to make the test size 20% of the input data (you can similarly specify trainset size with train_size=n , but in the absence of this specification the function will use 1-test_size , ie the complement of the test set). test_size=.2告诉函数使测试大小为输入数据的 20%(您可以类似地使用train_size=n指定train_size=n大小,但在没有此规范的情况下,函数将使用1-test_size ,即测试集)。

shuffle=False tells the function not to randomly shuffle the order. shuffle=False告诉函数不要随机打乱顺序。

First you should divide your data into train and test using slicing or sklearn's train_test_split (remember to use shuffle=False for time-series data).首先,您应该使用切片或 sklearn 的train_test_split (记住对时间序列数据使用shuffle=False )将数据分成训练和测试。

#divide data into train and test
train_ind = int(len(df)*0.8)
train = df[:train_ind]
test = df[train_ind:]

Then, you want to use Keras' TimeseriesGenerator to generate sequences for the LSTM to use as input.然后,您想使用Keras 的 TimeseriesGenerator为 LSTM 生成用作输入的序列。 This blog does a good job explaining it's usage.这个博客很好地解释了它的用法。

from keras.preprocessing.sequence import TimeseriesGenerator

n_input = 2 #length of output
generator = TimeseriesGenerator(train, targets=train, length=n_input)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM