values = df.values
train, test = train_test_split(values)
#Split into train and test
X_train, y_train = train[:, :-1], train[:, -1]
X_test, y_test = test[:, :-1], test[:, -1]
Executing the above code splits the time series dataset into training- 75% and testing 25%. I want to control the train-test split as 80-20 or 90-10. Can someone please help me understand how to split the dataset into any ratio I want?
The concept is borrowed from https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/ .
Note : I cannot split the dataset randomly for train and test and the most recent values have to be for testing . I have included a screenshot of my dataset.
If anyone can interpret the code, please do help me understand the above. Thanks.
Basically, you'll want to do something like train_test_split(values,test_size=.2,shuffle=False)
test_size=.2
tells the function to make the test size 20% of the input data (you can similarly specify trainset size with train_size=n
, but in the absence of this specification the function will use 1-test_size
, ie the complement of the test set).
shuffle=False
tells the function not to randomly shuffle the order.
First you should divide your data into train and test using slicing or sklearn's train_test_split (remember to use shuffle=False
for time-series data).
#divide data into train and test
train_ind = int(len(df)*0.8)
train = df[:train_ind]
test = df[train_ind:]
Then, you want to use Keras' TimeseriesGenerator to generate sequences for the LSTM to use as input. This blog does a good job explaining it's usage.
from keras.preprocessing.sequence import TimeseriesGenerator
n_input = 2 #length of output
generator = TimeseriesGenerator(train, targets=train, length=n_input)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.