Tensorflow Keras - 训练时精度高，预测时精度低

Question

I have a very basic multiclass CNN model for classifying vehicles into 4 classes [pickup, sedan, suv, van] that I have written using Tensorflow 2.0 tf.keras:我有一个非常基本的多类 CNN model 用于将车辆分为 4 类[pickup, sedan, suv, van] ，我使用 Tensorflow 2.0 tf.Z063009BB15C0272BD0C:701CFDFDFDF

he_initialiser = tf.keras.initializers.VarianceScaling()
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3,3), input_shape=(3,128,128), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format=cfg_data_fmt))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format=cfg_data_fmt))
model.add(tf.keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format='channels_first'))
model.add(tf.keras.layers.Flatten(data_format='channels_first'))
model.add(tf.keras.layers.Dense(128, activation='relu', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Dense(128, activation='relu', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Dense(4, activation='softmax', kernel_initializer=he_initialiser))

I use the following configuration for training:我使用以下配置进行训练：

Image size: 3x128x128 (planar data)图片尺寸：3x128x128（平面数据）
Number of epochs: 45时代数：45
Batch size: 32批量：32
Loss function: tf.keras.losses.CategoricalCrossentropy(from_logits=True)损失 function: tf.keras.losses.CategoricalCrossentropy(from_logits=True)
Optimizer: optimizer=tf.optimizers.Adam优化器： optimizer=tf.optimizers.Adam
training data size: 67.5% of all data训练数据大小：所有数据的 67.5%
validation data size: 12.5% of all data验证数据大小：所有数据的 12.5%
test data size: 20% of all data测试数据大小：所有数据的 20%

I have an unbalanced dataset, which has the following distribution:我有一个不平衡的数据集，它具有以下分布：

pickups: 1202
sedans: 1954
suvs: 2510
vans: 196

For this reason I have used class weights to mitigate this imbalance:出于这个原因，我使用了 class 权重来减轻这种不平衡：

pickup_weight: 4.87
sedan_weight: 3.0
suv_weight: 2.33
van_weight: 30.0

This seems like a small dataset but I am using this for fine tuning since I first train the model on a larger dataset of 16k images of these classes, though with images of vehicles taken from different angles as compared to my fine tune dataset.这似乎是一个小数据集，但我使用它进行微调，因为我首先在这些类别的 16k 图像的更大数据集上训练 model，尽管与我的微调数据集相比，车辆图像是从不同角度拍摄的。

Now the questions that I'm having stem from the following observations:现在我的问题源于以下观察：

At the end of the final epoch, the results returned by model.fit gave:在最后一个 epoch 结束时， model.fit返回的结果给出：

training accuracy of 0.9229训练精度为0.9229
training loss of 3.5055训练损失3.5055
validation accuracy of 0.7906验证准确度为0.7906
validation loss of 0.9382验证损失0.9382
training precision for class pickup of 0.9186 class的训练精度为0.9186
training precision for class sedan of 0.9384 class轿车的训练精度为0.9384
training precision for class suv of 0.9196 class suv的训练精度为0.9196
training precision for class van of 0.8378 class van的训练精度为0.8378
validation precision for class pickup of 0.7805 class拾音器的验证精度为0.7805
validation precision for class sedan of 0.8026 class轿车的验证精度为0.8026
validation precision for class suv of 0.0.8029 class suv的验证精度为0.0.8029
validation precision for class van of 0.4615 class van的验证精度为0.4615

The results returned by model.evaluate on my hold-out test set after training gave similar accuracy and loss values as the corresponding validation values in the last epoch and the precision values for each class were also nearly identical to the corresponding validation precisions. model.evaluate在训练后我的保留测试集上返回的结果给出了与最后一个时期的相应验证值相似的准确度和损失值，并且每个 class 的精度值也几乎与相应的验证精度相同。

The lower, but still high enough, validation accuracy leads me to believe there is no overfitting problem as the model can generalize.较低但仍然足够高的验证准确度使我相信没有过度拟合问题，因为 model 可以概括。

My first question is how can the validation loss be so much lower than the training loss?我的第一个问题是验证损失怎么会比训练损失低这么多？

Furthermore, when I created a confusion matrix using:此外，当我使用以下方法创建混淆矩阵时：

test_images = np.array([x[0].numpy() for x in list(labeled_ds_test)])
test_labels = np.array([x[1].numpy() for x in list(labeled_ds_test)])
test_predictions = model.predict(test_images, batch_size=32)
print(tf.math.confusion_matrix(tf.argmax(test_labels, 1), tf.argmax(test_predictions, 1)))

The results I got back were:我得到的结果是：

tf.Tensor(
[[ 42  85 109   3]
 [ 72 137 177   4]
 [ 91 171 228  11]
 [  9  12  16   1]], shape=(4, 4), dtype=int32)

This shows an accuracy of only 35%!!这表明准确率只有 35%！

My second question is therefore this: how can the accuracy given by model.predict be so small when during training and evaluation the values seemed to indicate that my model was quite precise with its predictions?因此，我的第二个问题是：在训练和评估期间， model.predict给出的准确度怎么会如此之小？

Am I using the predict method wrong or is my theoretical understanding of what's expected to happen completely off?我使用预测方法是错误的，还是我对预期会发生的事情的理论理解完全错误？

I am at a bit of a loss here and would greatly appreciate any feedback.我在这里有点茫然，非常感谢任何反馈。 Thanks for reading this.感谢您阅读本文。

Answer 1

I aggree @gallen.我同意@gallen。 There are several reason that can cause overfitting and several methods for preventing overfitting.有几个原因会导致过拟合以及几种防止过拟合的方法。 One of the good solutions is adding dropout between layers.一个好的解决方案是在层之间添加 dropout。 You can see stackoverflow answer and towardsdatascience article你可以看到stackoverflow的答案和朝向数据科学的文章

Answer 2

There is an overfitting of course but let's answer the questions.当然存在过度拟合，但让我们回答问题。

For the first question the low number of validation data plays a role why it's loss is less than the training data as the loss is the sum of all differences in y_true and y_pred .对于第一个问题，由于损失是y_true和y_pred中所有差异的总和，所以验证数据的数量少是为什么它的损失小于训练数据的原因。

As for the second question how can the test accuracy be lower than the expected even if validation doesn't show any sign of overfitting?至于第二个问题，即使验证没有显示任何过度拟合的迹象，测试准确度怎么会低于预期？

The distribution of the validation set must be the same as the test set for it not to be miss leading.验证集的分布必须与测试集相同，以免错过领先。

So my advice is check the distribution of the train, validation, test datasets separately.所以我的建议是分别检查训练、验证、测试数据集的分布。 make sure that they are the same.确保它们相同。

Answer 3

you need to divide your dataset properly like, 70% training and 30% validation and then check your model on new set of data as test data this might be helpful as machine learning is all about trial and error.您需要正确划分数据集，例如 70% 的训练和 30% 的验证，然后在新数据集上检查 model 作为测试数据，这可能会有所帮助，因为机器学习都是关于反复试验的。

Tensorflow Keras - 训练时精度高，预测时精度低

问题描述

3 个解决方案

解决方案1
0 2020-06-26 22:33:20

解决方案2
0 2020-06-26 22:39:28

解决方案3
0 2021-01-13 09:49:41

Tensorflow Keras - 训练时精度高，预测时精度低

问题描述

3 个解决方案

解决方案1 0 2020-06-26 22:33:20

解决方案2 0 2020-06-26 22:39:28

解决方案3 0 2021-01-13 09:49:41

解决方案1
0 2020-06-26 22:33:20

解决方案2
0 2020-06-26 22:39:28

解决方案3
0 2021-01-13 09:49:41