I have a data set made of several months (from JAN-15 do SEPT-17), reporting a customer financial situation for each month. My task it to predict the cumulative sales for each customer for the next 12 months.
My dataset looks like this (this is the raw data, for training I will create lagged features)
Month CustomerID NetSales
JAN-15 A 10
JAN-15 B 10
JAN-15 C 10
FEB-15 A 10
FEB-15 B 10
FEB-15 C 10
...
How can I split in TRAIN / VAL / TEST it with consistency to time? Can I do something like this?
Is this a consistent split strategy? In alternative, what would you advice?
Thanks a lot, Andrea
Is this a consistent split strategy?
Yes, you are respecting the fact, that you not use the data for your validation set which is before your training data, same for your test set. You are preventing data leakage, this is the right way to do it.
In alternative, what would you advice?
The only thing which you can change is the portion of your train,val,test set, but this you can try. As it is a timeseries you should consider seasonal trends, that they are all covered in your training data.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.