简体繁体 English

用于时间序列预测的 LSTM 中特定时期后的嘈杂火车损失（Keras）

[英]Noisy train loss after specific epoch in LSTM for time series forecasting (Keras)

原文 2019-10-24 14:29:55 7 1 python/ keras/ time-series/ lstm/ loss

I'm training LSTM model for time series forecasting.我正在训练 LSTM model 进行时间序列预测。 This is the train loss plot.这是火车损失 plot。

This is a one-step-ahead forecasting case, so I'm training the model using a rolling window.这是一个提前一步的预测案例，所以我正在使用滚动 window 训练 model。 Here, we have 26 steps of forecasting (for every step, I train the model again).在这里，我们有 26 个步骤的预测（对于每一步，我再次训练 model）。 As you can see, after Epoch #25~27, the training loss suddenly will be so noisily.如您所见，在 Epoch #25~27 之后，训练损失突然变得如此嘈杂。 Why we have this behaviour?为什么我们有这种行为？

Ps.附言。 I'm using LSTM with tanh activation.我正在使用带有tanh激活的 LSTM。 Also, I used L1 and L2 regularization, but the behaviour is the same.另外，我使用了L1和L2正则化，但行为是相同的。 The layer after LSTM is a Dense layer with linear activation, I MinMaxScaler is applied on input data and the optimizer is Adam . LSTM之后的层是具有linear激活的Dense层，I MinMaxScaler应用于输入数据，优化器是Adam 。 I also see the same behaviour in validation dataset.我在验证数据集中也看到了同样的行为。

1 个解决方案

Are you using gradient clipping if so not that could help you since gradient values become really, really small or large making it very difficult to make further progress for the model to learn better.如果不是这样，您是否使用渐变剪裁可以帮助您，因为渐变值变得非常非常小或大，使得 model 学习更好地取得进一步进展非常困难。 The recurrent layer may have created this valley of loss that you may be missing because the gradient is too large.循环层可能已经创建了这个损失谷，你可能会因为梯度太大而错过它。