简体   繁体   中英

Noisy train loss after specific epoch in LSTM for time series forecasting (Keras)

I'm training LSTM model for time series forecasting. This is the train loss plot.

我

This is a one-step-ahead forecasting case, so I'm training the model using a rolling window. Here, we have 26 steps of forecasting (for every step, I train the model again). As you can see, after Epoch #25~27, the training loss suddenly will be so noisily. Why we have this behaviour?

Ps. I'm using LSTM with tanh activation. Also, I used L1 and L2 regularization, but the behaviour is the same. The layer after LSTM is a Dense layer with linear activation, I MinMaxScaler is applied on input data and the optimizer is Adam . I also see the same behaviour in validation dataset.

Are you using gradient clipping if so not that could help you since gradient values become really, really small or large making it very difficult to make further progress for the model to learn better. The recurrent layer may have created this valley of loss that you may be missing because the gradient is too large.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM