简体繁体中英

Question about Train-Test Split in Time Series

原文 2020-05-31 14:34:35 7 1 python

I got a question about splitting the data into a training and test set in Time Series tasks. I know that the data can't be shuffled, because its important to keep the time nature of the data, so we do not create the scenario where we are able to look into the future. However, when I shuffle the data ( for experimenting ), I get a ridiculously high R-Squared score. And yes, the R Squared is evaluated with the test set. Can someone maybe simply explain why this is the case? Why does shuffling train and test data in time series produce a high R-Squared score? My guess is that it has something to the with the trend of the time series, but i am not sure. I am just asking out of curiosity, thanks !

1 answers

It really depends upon your problem. If:

if your model has no memory, and merely a mapping tasks then attached timestamp does not have any significance it is better in fact recommended to shuffle the data for better distribution. If this is the case and you are getting a higher R-squaed value you shoud definitely go for it. (I assume this is the case since R-squared is usually used for these types of tasks)
If your task is pattrn dependent and each prediction is affecting next in the sequence. This is where order matters. In this case you should never shuffle the data. Any metric which suggest that is lying. The best you can do is split train and test set based on a timestamp prior to which you have your train set and afterwards test set. Then divide train and test sets into fixed time windows. You can shuffle those windows now only if the window span is large enough for your case.

Hope this helps!

Train-Test split for Time Series Data to be used for LSTM

Problem in LSTM train-test split in time series data

Why does my kernel die every time I run train-test split on this particular dataset?

time series dataset train test split ML

How to train-test split and cross-validate in surprise?

Train-test split does not seem to work properly in Python?

Custom train-test split using two stratified classes

Is train_test_split(shuffle=False) appropriate for time series?

Split time series data into Train Test and Valid sets in Python

Split data set into train and test for time series analysis in python

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Train-Test split for Time Series Data to be used for LSTM Problem in LSTM train-test split in time series data Why does my kernel die every time I run train-test split on this particular dataset? time series dataset train test split ML How to train-test split and cross-validate in surprise? Train-test split does not seem to work properly in Python? Custom train-test split using two stratified classes Is train_test_split(shuffle=False) appropriate for time series? Split time series data into Train Test and Valid sets in Python Split data set into train and test for time series analysis in python

Related Tags

Question about Train-Test Split in Time Series

Question

1 answers

solution1 0 2020-05-31 16:44:14

solution1
0 2020-05-31 16:44:14