[英]Question about Train-Test Split in Time Series
I got a question about splitting the data into a training and test set in Time Series tasks.我有一个关于在时间序列任务中将数据拆分为训练和测试集的问题。 I know that the data can't be shuffled, because its important to keep the time nature of the data, so we do not create the scenario where we are able to look into the future.
我知道数据不能被洗牌,因为保持数据的时间性很重要,所以我们不会创造我们能够展望未来的场景。 However, when I shuffle the data ( for experimenting ), I get a ridiculously high R-Squared score.
然而,当我对数据进行洗牌(用于实验)时,我得到了一个高得离谱的 R-Squared 分数。 And yes, the R Squared is evaluated with the test set.
是的,R Squared 使用测试集进行了评估。 Can someone maybe simply explain why this is the case?
有人可以简单地解释为什么会这样吗? Why does shuffling train and test data in time series produce a high R-Squared score?
为什么在时间序列中改组训练和测试数据会产生高 R-Squared 分数? My guess is that it has something to the with the trend of the time series, but i am not sure.
我的猜测是它与时间序列的趋势有关,但我不确定。 I am just asking out of curiosity, thanks !
我只是出于好奇而问,谢谢!
It really depends upon your problem.这真的取决于你的问题。 If:
如果:
Hope this helps!希望这可以帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.