用于 LSTM 的时间序列数据的训练测试拆分

Question

values = df.values
train, test = train_test_split(values)

#Split into train and test
X_train, y_train = train[:, :-1], train[:, -1]
X_test, y_test = test[:, :-1], test[:, -1]

Executing the above code splits the time series dataset into training- 75% and testing 25%.执行上述代码将时间序列数据集拆分为训练 75% 和测试 25%。 I want to control the train-test split as 80-20 or 90-10.我想将训练测试拆分控制为 80-20 或 90-10。 Can someone please help me understand how to split the dataset into any ratio I want?有人可以帮助我了解如何将数据集拆分为我想要的任何比例吗？

The concept is borrowed from https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/ .这个概念是从https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/借来的。

Note : I cannot split the dataset randomly for train and test and the most recent values have to be for testing .注意：我不能为训练和测试随机拆分数据集，最近的值必须用于测试。 I have included a screenshot of my dataset.我已经包含了我的数据集的屏幕截图。

If anyone can interpret the code, please do help me understand the above.如果有人可以解释代码，请帮助我理解上述内容。 Thanks.谢谢。

Answer 1

Here's the documentation. 这是文档。

Basically, you'll want to do something like train_test_split(values,test_size=.2,shuffle=False)基本上，你会想要做类似train_test_split(values,test_size=.2,shuffle=False)

test_size=.2 tells the function to make the test size 20% of the input data (you can similarly specify trainset size with train_size=n , but in the absence of this specification the function will use 1-test_size , ie the complement of the test set). test_size=.2告诉函数使测试大小为输入数据的 20%（您可以类似地使用train_size=n指定train_size=n大小，但在没有此规范的情况下，函数将使用1-test_size ，即测试集）。

shuffle=False tells the function not to randomly shuffle the order. shuffle=False告诉函数不要随机打乱顺序。

Answer 2

First you should divide your data into train and test using slicing or sklearn's train_test_split (remember to use shuffle=False for time-series data).首先，您应该使用切片或 sklearn 的train_test_split （记住对时间序列数据使用shuffle=False ）将数据分成训练和测试。

#divide data into train and test
train_ind = int(len(df)*0.8)
train = df[:train_ind]
test = df[train_ind:]

Then, you want to use Keras' TimeseriesGenerator to generate sequences for the LSTM to use as input.然后，您想使用Keras 的 TimeseriesGenerator为 LSTM 生成用作输入的序列。 This blog does a good job explaining it's usage.这个博客很好地解释了它的用法。

from keras.preprocessing.sequence import TimeseriesGenerator

n_input = 2 #length of output
generator = TimeseriesGenerator(train, targets=train, length=n_input)

用于 LSTM 的时间序列数据的训练测试拆分

问题描述

2 个解决方案

解决方案1
2 2020-09-28 19:01:28

解决方案2
1 已采纳 2020-09-28 19:09:15

用于 LSTM 的时间序列数据的训练测试拆分

问题描述

2 个解决方案

解决方案1 2 2020-09-28 19:01:28

解决方案2 1 已采纳 2020-09-28 19:09:15

解决方案1
2 2020-09-28 19:01:28

解决方案2
1 已采纳 2020-09-28 19:09:15