I'm training LSTM model for time series forecasting. This is the train loss plot.
This is a one-step-ahead forecasting case, so I'm training the model using a rolling window. Here, we have 26 steps of forecasting (for every step, I train the model again). As you can see, after Epoch #25~27, the training loss suddenly will be so noisily. Why we have this behaviour?
Ps. I'm using LSTM with tanh
activation. Also, I used L1
and L2
regularization, but the behaviour is the same. The layer after LSTM
is a Dense
layer with linear
activation, I MinMaxScaler
is applied on input data and the optimizer is Adam
. I also see the same behaviour in validation dataset.
Are you using gradient clipping if so not that could help you since gradient values become really, really small or large making it very difficult to make further progress for the model to learn better. The recurrent layer may have created this valley of loss that you may be missing because the gradient is too large.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.