简体   繁体   English

训练/验证/测试分割时间 LSTM

[英]Train / Val / Test split time LSTM

I have a data set made of several months (from JAN-15 do SEPT-17), reporting a customer financial situation for each month.我有一个由几个月组成的数据集(从 JAN-15 到 SEPT-17),报告每个月的客户财务状况。 My task it to predict the cumulative sales for each customer for the next 12 months.我的任务是预测每个客户未来 12 个月的累计销售额。

My dataset looks like this (this is the raw data, for training I will create lagged features)我的数据集看起来像这样(这是原始数据,为了训练我将创建滞后特征)

Month   CustomerID NetSales
JAN-15     A          10
JAN-15     B          10
JAN-15     C          10
FEB-15     A          10
FEB-15     B          10
FEB-15     C          10
...

How can I split in TRAIN / VAL / TEST it with consistency to time?我怎样才能在 TRAIN / VAL / TEST 中以与时间一致的方式拆分它? Can I do something like this?我可以做这样的事情吗?

  • TRAIN --> all customer / months from JAN-15 to MAR-16 (I take each month at least once so the model will learn seasonal patterns火车--> 从 1 月 15 日到 3 月 16 日的所有客户/月份(我每个月至少服用一次,因此 model 将学习季节性模式
  • VAL --> all customer / months from APR-16 to JUN-16 VAL --> 所有客户/月从 APR-16 到 JUN-16
  • TEST --> all customer / months from JUL-16 to SEP-16 (I stop here because I neeed the followin 12 months to create the target variable)测试--> 从 JUL-16 到 SEP-16 的所有客户/月(我停在这里,因为我需要接下来的 12 个月来创建目标变量)

Is this a consistent split strategy?这是一致的拆分策略吗? In alternative, what would you advice?或者,您有什么建议?

Thanks a lot, Andrea非常感谢,安德里亚

Is this a consistent split strategy?这是一致的拆分策略吗?

Yes, you are respecting the fact, that you not use the data for your validation set which is before your training data, same for your test set.是的,您尊重这样一个事实,即您不使用训练数据之前的验证集数据,对于您的测试集也是如此。 You are preventing data leakage, this is the right way to do it.您正在防止数据泄漏,这是正确的方法。

In alternative, what would you advice?或者,您有什么建议?

The only thing which you can change is the portion of your train,val,test set, but this you can try.您唯一可以更改的是您的训练集、验证集、测试集的部分,但您可以尝试一下。 As it is a timeseries you should consider seasonal trends, that they are all covered in your training data.由于它是一个时间序列,您应该考虑季节性趋势,它们都包含在您的训练数据中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM