简体繁体 English

验证损失和准确性有很多“跳跃”

[英]Validation loss and accuracy has a lot of 'jumps'

原文 2023-01-29 12:43:35 4 2 tensorflow/ machine-learning/ deep-learning/ conv-neural-network/ image-classification

Hello everyone so I made this cnn model.大家好，所以我做了这个 cnn model。

My data:我的数据：

Train folder->30 classes->800 images each->24000 all together训练文件夹->30 类->每类 800 张图片-> 总共 24000 张

Validation folder->30 classes->100 images each->3000 all together验证文件夹->30 个类别->每个类别 100 个图像-> 总共 3000 个

Test folder->30 classes -> 100 images each -> 3000 all together测试文件夹 -> 30 个类 -> 每个类 100 张图像 -> 总共 3000 张

-I've applied data augmentation. -我应用了数据增强。 ( on the train data) （关于火车数据）

-I got 5 conv layers with filters 32->64->128->128->128 -我有 5 个卷积层，过滤器为 32->64->128->128->128
each with maxpooling and batch normalization每个都有最大池化和批量归一化

-Added dropout 0.5 after flattening layers -在压平层后添加了 0.5 的 dropout

Train part looks good.火车部分看起来不错。 Validation part has a lot of 'jumps' though.验证部分虽然有很多“跳跃”。 Does it overfit?它会过拟合吗？
Is there any way to fix this and make validation part more stable?有什么办法可以解决这个问题并使验证部分更稳定吗？

Note: I plann to increase epochs on my final model I'm just experimenting to see what works best since the model takes a lot of time in order to train.注意：我计划在我的最终 model 上增加 epochs 我只是在试验看看什么最有效，因为 model 需要很多时间才能训练。 So for now I train with 20 epochs.所以现在我用 20 个 epoch 训练。

2 个解决方案

Train part looks good.火车部分看起来不错。 Validation part has a lot of 'jumps' though.验证部分虽然有很多“跳跃”。 Does it overfit?它会过拟合吗？

the answer is yes.答案是肯定的。 The so-called 'jumps' in the validation part may indicate that the model is not generalizing well to the validation data and therefore your model might be overfitting.验证部分中所谓的“跳跃”可能表明 model 没有很好地概括验证数据，因此您的 model 可能过度拟合。

Is there any way to fix this and make validation part more stable?有什么办法可以解决这个问题并使验证部分更稳定吗？

To fix this you can use the following:要解决此问题，您可以使用以下内容：

Increasing the size of your training set增加训练集的大小
Regularization techniques正则化技术
Early stopping提前停止
Reduce the complexity of your model降低 model 的复杂性
Use different hyperparameters like learning rate使用不同的超参数，如学习率

I've applied data augmentation (on the train data).我应用了数据增强（在火车数据上）。

What does this mean?这是什么意思？ What kind of data did you add and how much?你添加了什么样的数据，添加了多少？ You might think I'm nitpicking, but if the distribution of the augmented data is different enough from the original data, then this will indeed cause your model to generalize poorly to the validation set.您可能认为我在吹毛求疵，但如果扩充数据的分布与原始数据差异很大，那么这确实会导致您的 model 无法很好地泛化到验证集。

Increasing your epochs isn't going to help here, your training loss is already decreasing reasonably.增加你的时代在这里没有帮助，你的训练损失已经在合理减少。 Training your model for longer is a good step if the validation loss is also decreasing nicely, but that's obviously not the case.如果验证损失也减少得很好，那么对 model 进行更长时间的训练是一个很好的步骤，但显然情况并非如此。

Some things I would personally try:我会亲自尝试一些事情：

Try decreasing the learning rate.尝试降低学习率。
Try training the model without the augmented data and see how the validation loss behaves.尝试在没有增强数据的情况下训练 model，看看验证损失如何表现。
Try splitting the augmented data so that it's also contained in the validation set and see how the model behaves.尝试拆分扩充数据，使其也包含在验证集中，并查看 model 的行为方式。