简体繁体 English

为什么更多的时期会使我的模型变得更糟？

[英]Why does more epochs make my model worse?

原文 2018-07-16 15:47:05 2 1 python/ tensorflow/ machine-learning/ keras/ lstm

Most of my code is based on this article and the issue I'm asking about is evident there, but also in my own testing. 我的大部分代码都基于本文，而我要问的问题在这里很明显，而且在我自己的测试中也很明显。 It is a sequential model with LSTM layers. 它是具有LSTM层的顺序模型。

Here is a plotted prediction over real data from a model that was trained with around 20 small data sets for one epoch. 这是对来自模型的真实数据的绘制预测，该模型使用一个时期用大约20个小数据集进行训练。

Here is another plot but this time with a model trained on more data for 10 epochs. 这是另一幅图，但是这次是使用模型训练了10个时期的更多数据。

What causes this and how can I fix it? 是什么原因造成的，我该如何解决？ Also that first link I sent shows the same result at the bottom - 1 epoch does great and 3500 epochs is terrible. 同样，我发送的第一个链接在底部显示了相同的结果-1个时期确实很棒，而3500个时期非常糟糕。

Furthermore, when I run a training session for the higher data count but with only 1 epoch, I get identical results to the second plot. 此外，当我为一个较高的数据计数但只有一个时期运行训练时，我得到的结果与第二个图相同。

What could be causing this issue? 是什么导致此问题？

1 个解决方案

A few questions: 几个问题：

Is this graph for training data or validation data? 此图是训练数据还是验证数据？
Do you consider it better because: 您是否认为它更好，因为：
- The graph seems cool? 该图看起来很酷？
- You actually have a better "loss" value? 您实际上具有更好的“损失”价值吗？
  - If so, was it training loss? 如果是这样，那是培训损失吗？
  - Or validation loss? 还是验证损失？

Cool graph 酷图

The early graph seems interesting, indeed, but take a close look at it: 确实，早期的图表似乎很有趣，但请仔细看一下：

I clearly see huge predicted valleys where the expected data should be a peak 我清楚地看到了预期的数据应该达到峰值的巨大预测谷

Is this really better? 这真的更好吗？ It sounds like a random wave that is completely out of phase, meaning that a straight line would indeed represent a better loss than this. 听起来像是完全异相的随机波，这意味着直线确实比这更好。

Take a look a the "training loss", this is what can surely tell you if your model is better or not. 看看“训练损失”，这肯定可以告诉您您的模型是否更好。

If this is the case and your model isn't reaching the desired output, then you should probably make a more capable model (more layers, more units, a different method, etc.). 如果是这种情况，而您的模型没有达到所需的输出，那么您可能应该制作一个功能更强大的模型（更多的层，更多的单元，不同的方法等）。 But be aware that many datasets are simply too random to be learned, no matter how good the model. 但是请注意，无论模型多么出色，许多数据集都是太随机而无法学习。

Overfitting - Training loss gets better, but validation loss gets worse 过度拟合-训练损失变好，但验证损失变差

In case you actually have a better training loss. 如果您实际上有更好的训练损失。 Ok, so your model is indeed getting better. 好的，所以您的模型确实在变好。

Are you plotting training data? 您是否正在绘制训练数据？ - Then this straight line is actually better than a wave out of phase -那么这条直线实际上比异相波还好
Are you plotting validation data? 您是否正在绘制验证数据？
- What is happening with the validation loss? 验证损失发生了什么？ Better or worse? 更好或更差？

If your "validation" loss is getting worse, your model is overfitting. 如果您的“验证”损失越来越严重，则表明您的模型过度拟合。 It's memorizing the training data instead of learning generally. 它是在记忆训练数据，而不是一般地学习。 You need a less capable model, or a lot of "dropout". 您需要功能较弱的模型，或大量的“辍学”模型。

Often, there is an optimal point where the validation loss stops going down, while the training loss keeps going down. 通常，在最佳点上，验证损失会停止下降，而训练损失会持续下降。 This is the point to stop training if you're overfitting. 如果您过度健身，这是停止训练的关键。 Read about the EarlyStopping callback in keras documentation. 在keras文档中阅读有关EarlyStopping回调的信息。

Bad learning rate - Training loss is going up indefinitely 学习率低-培训损失会无限期增加

If your training loss is going up, then you've got a real problem there, either a bug, a badly prepared calculation somewhere if you're using custom layers, or simply a learning rate that is too big . 如果您的培训损失在增加，那么您就遇到了一个真正的问题，要么是错误，要么是使用自定义图层的地方某个计算准备不好，或者仅仅是学习率太大 。

Reduce the learning rate (divide it by 10, or 100), create and compile a "new" model and restart training. 降低学习率（将其除以10或100），创建并编译“新”模型，然后重新开始训练。