简体繁体 English

解释神经网络的列车验证损失

[英]Interpretation of train-validation loss of a Neural Network

原文 2018-10-01 03:34:11 8 2 python/ neural-network/ keras/ lstm/ loss

I have trained an LSTM model for Time Series Forecasting. 我已经为时间序列预测培训了LSTM模型。 I have used an early stopping method with a patience of 150 epochs. 我使用了早期停止方法，耐心150个时代。 I have used a dropout of 0.2, and this is the plot of train and validation loss: 我使用了0.2的dropout ，这是火车和验证损失的情节：

The early stopping method stop the training after 650 epochs, and save the best weight around epoch 460 where the validation loss was the best. 早期停止方法在650个时期之后停止训练，并且在时间460周围保存最佳重量，其中验证损失是最佳的。

My question is : Is it normal that the train loss is always above the validation loss? 我的问题是：列车损失总是高于验证损失是否正常？ I know that if it was the opposite(validation loss above the train) it would have been a sign of overfitting. 我知道，如果它是相反的（火车上方的验证损失），那将是过度拟合的迹象。 But what about this case? 但是这个案子怎么样？

EDIT : My dataset is a Time Series with hourly temporal frequence. 编辑：我的数据集是一个具有每小时时间频率的时间序列。 It is composed of 35000 instance. 它由35000个实例组成。 I have split the data into 80 % train and 20% validation but in temporal order. 我已将数据拆分为80％列车和20％验证，但按时间顺序排列。 So for example the training will contain the data until the beginning of 2017 and the validation the data from 2017 until the end. 因此，例如，培训将包含直到2017年初的数据以及从2017年到结束的数据验证。 I have created this plot by averaging the data over 15 days and this is the result: 我通过平均15天的数据创建了这个图，这是结果：

So maybe the reason is as you said that the validation data have an easier pattern. 所以也许原因就像你说验证数据有一个更简单的模式。 How can i solve this problem? 我怎么解决这个问题？

2 个解决方案

For most cases, the validation loss should be higher than the training loss because the labels in the training set are accessible to the model. 对于大多数情况，验证损失应高于训练损失，因为训练集中的标签可供模型访问。 In fact, one good habit to train a new network is to use a small subset of the data and see whether the training loss can converge to 0 (fully overfits the training set). 实际上，训练新网络的一个好习惯是使用一小部分数据，看看训练损失是否可以收敛到0（完全过度训练集）。 If not, it means this model is somehow incompetent to memorize the data. 如果没有，这意味着这个模型在某种程度上无法记住数据。

Let's go back to your problem. 让我们回到你的问题。 I think the observation that validation loss is less than training loss happens. 我认为验证损失低于训练损失的观察发生了。 But this possibly is not because of your model, but how you split the data. 但这可能不是因为你的模型，而是你如何拆分数据。 Consider that there are two types of patterns (A and B) in the dataset. 考虑数据集中有两种类型的模式（A和B）。 If you split in a way that the training set contains both pattern A and pattern B, while the small validation set only contains pattern B. In this case, if B is easier to be recognized, then you might get a higher training loss. 如果您以训练集包含模式A和模式B的方式进行分割，而小验证集仅包含模式B.在这种情况下，如果B更容易被识别，那么您可能会获得更高的训练损失。

In a more extreme example, pattern A is almost impossible to recognize but there are only 1% of them in the dataset. 在一个更极端的例子中，模式A几乎不可能识别，但数据集中只有1％。 And the model can recognize all pattern B. If the validation set happens to have only pattern B, then the validation loss will be smaller. 并且模型可以识别所有模式B.如果验证集恰好只有模式B，则验证损失将更小。

As alex mentioned, using K-fold is a good solution to make sure every sample will be used as both validation and training data. 正如亚历克斯所提到的，使用K-fold是一个很好的解决方案，可确保每个样本都用作验证和训练数据。 Also, printing out the confusion matrix to make sure all labels are relatively balanced is another method to try. 此外，打印出混淆矩阵以确保所有标签相对平衡是另一种尝试方法。

Usually the opposite is true. 通常情况恰恰相反。 But since you are using drop out ,it is common to have the validation loss less than the training loss.And like others have suggested try k-fold cross validation 但是由于您使用辍学，通常会使验证损失低于训练损失。并且像其他人一样建议尝试k-fold交叉验证