简体   繁体   中英

Train / Val / Test split time LSTM

I have a data set made of several months (from JAN-15 do SEPT-17), reporting a customer financial situation for each month. My task it to predict the cumulative sales for each customer for the next 12 months.

My dataset looks like this (this is the raw data, for training I will create lagged features)

Month   CustomerID NetSales
JAN-15     A          10
JAN-15     B          10
JAN-15     C          10
FEB-15     A          10
FEB-15     B          10
FEB-15     C          10
...

How can I split in TRAIN / VAL / TEST it with consistency to time? Can I do something like this?

  • TRAIN --> all customer / months from JAN-15 to MAR-16 (I take each month at least once so the model will learn seasonal patterns
  • VAL --> all customer / months from APR-16 to JUN-16
  • TEST --> all customer / months from JUL-16 to SEP-16 (I stop here because I neeed the followin 12 months to create the target variable)

Is this a consistent split strategy? In alternative, what would you advice?

Thanks a lot, Andrea

Is this a consistent split strategy?

Yes, you are respecting the fact, that you not use the data for your validation set which is before your training data, same for your test set. You are preventing data leakage, this is the right way to do it.

In alternative, what would you advice?

The only thing which you can change is the portion of your train,val,test set, but this you can try. As it is a timeseries you should consider seasonal trends, that they are all covered in your training data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM