CNN with Keras，训练期间acc高但测试相同数据集时acc低

Question

I am using Google Colab to build a CNN using Keras. The data set contains 3 classes with the same number of images for each class. The images are in my Google Drive organized as我正在使用 Google Colab 使用 Keras 构建 CNN。数据集包含 3 个类别，每个类别 class 的图像数量相同。图像在我的 Google Drive 中组织为

Images:
-- class 1
-- class 2
-- class 3

The code to read the data and create the CNN is here:读取数据和创建 CNN 的代码在这里：

batch_size = 30

data = ImageDataGenerator(rescale=1. / 255, 
                          validation_split=0.2)

training_data = data.flow_from_directory('/content/drive/My Drive/Data/Images', 
                                         target_size=(200, 200), shuffle=True, batch_size = batch_size, 
                                         class_mode='categorical', subset='training')

test_data = data.flow_from_directory('/content/drive/My Drive/Data/Images', 
                                     target_size=(200, 200), batch_size = batch_size, shuffle=False,
                                     class_mode='categorical', subset='validation')

numBatchTest = ceil(len(test_data.filenames) / (1.0 * batch_size)) # 1.0 to avoid integer division
numBatchTrain = ceil(len(training_data.filenames) / (1.0 * batch_size)) # 1.0 to avoid integer division

numClasses = 3

Classifier=Sequential()
Classifier.add(Conv2D(32, kernel_size=(5, 5), input_shape=(200, 200, 3)))
Classifier.add(BatchNormalization())
Classifier.add(Activation('relu'))
Classifier.add(MaxPooling2D(pool_size=(2,2)))
Classifier.add(Dropout(0.2))
               
Classifier.add(Conv2D(64, kernel_size=(3, 3)))
Classifier.add(BatchNormalization())
Classifier.add(Activation('relu'))
Classifier.add(MaxPooling2D(pool_size=(2,2)))
Classifier.add(Dropout(0.2))

Classifier.add(Flatten())

Classifier.add(Dense(64, activation='relu'))
Classifier.add(Dense(32, activation='relu'))
Classifier.add(Dense(16, activation='relu'))
Classifier.add(Dense(8, activation='relu'))
Classifier.add(Dense(numClasses, activation='softmax'))

I train the.network and use the test data as verification:我训练.network并使用测试数据作为验证：

MyEpochs = 150
Classifier.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.SGD(learning_rate=0.01), 
              metrics=['accuracy']) 

Classifier.fit(training_data,
                        batch_size = 30,
                        epochs = MyEpochs,
                        validation_data=test_data,
                        shuffle = 1)

The accuracy and vaildation accuracy are both above 90% for the training output:训练 output 的准确率和验证准确率都在 90% 以上：

Epoch 135/150
4/4 [==============================] - 0s 123ms/step - loss: 0.0759 - accuracy: 0.9750 - val_loss: 0.1891 - val_accuracy: 0.9667
Epoch 136/150
4/4 [==============================] - 0s 124ms/step - loss: 0.1153 - accuracy: 0.9583 - val_loss: 0.2348 - val_accuracy: 0.9333
Epoch 137/150
4/4 [==============================] - 1s 134ms/step - loss: 0.1059 - accuracy: 0.9417 - val_loss: 0.1893 - val_accuracy: 0.9667
Epoch 138/150
4/4 [==============================] - 0s 122ms/step - loss: 0.0689 - accuracy: 0.9833 - val_loss: 0.1991 - val_accuracy: 0.9667
Epoch 139/150
4/4 [==============================] - 1s 131ms/step - loss: 0.0716 - accuracy: 0.9750 - val_loss: 0.2175 - val_accuracy: 0.9333
Epoch 140/150
4/4 [==============================] - 0s 125ms/step - loss: 0.1118 - accuracy: 0.9417 - val_loss: 0.2466 - val_accuracy: 0.9333
Epoch 141/150
4/4 [==============================] - 1s 126ms/step - loss: 0.1046 - accuracy: 0.9417 - val_loss: 0.2351 - val_accuracy: 0.9333
Epoch 142/150
4/4 [==============================] - 0s 120ms/step - loss: 0.0988 - accuracy: 0.9417 - val_loss: 0.1994 - val_accuracy: 0.9333
Epoch 143/150
4/4 [==============================] - 0s 124ms/step - loss: 0.0803 - accuracy: 0.9500 - val_loss: 0.1910 - val_accuracy: 0.9667
Epoch 144/150
4/4 [==============================] - 0s 124ms/step - loss: 0.0786 - accuracy: 0.9750 - val_loss: 0.1908 - val_accuracy: 0.9667
Epoch 145/150
4/4 [==============================] - 0s 124ms/step - loss: 0.0947 - accuracy: 0.9500 - val_loss: 0.4854 - val_accuracy: 0.8667
Epoch 146/150
4/4 [==============================] - 1s 128ms/step - loss: 0.2091 - accuracy: 0.9000 - val_loss: 0.1858 - val_accuracy: 0.9333
Epoch 147/150
4/4 [==============================] - 0s 124ms/step - loss: 0.0838 - accuracy: 0.9417 - val_loss: 0.1779 - val_accuracy: 0.9667
Epoch 148/150
4/4 [==============================] - 1s 128ms/step - loss: 0.0771 - accuracy: 0.9667 - val_loss: 0.1897 - val_accuracy: 0.9667
Epoch 149/150
4/4 [==============================] - 0s 120ms/step - loss: 0.0869 - accuracy: 0.9667 - val_loss: 0.1982 - val_accuracy: 0.9667
Epoch 150/150
4/4 [==============================] - 0s 119ms/step - loss: 0.0809 - accuracy: 0.9500 - val_loss: 0.2615 - val_accuracy: 0.9333

To test the model, I predict the training data again:为了测试 model，我再次预测训练数据：

training_data.reset()
test_data.reset()

predicted_scores = Classifier.predict(training_data, verbose=1)
predicted_labels = predicted_scores.argmax(axis=1) 

train_labels = []
training_data.reset()

for i in range(0,numBatchTrain):
    train_labels =  np.append(train_labels, (training_data[i][1]).argmax(axis = 1))
print(train_labels)
print(predicted_labels)

acc_score = accuracy_score(train_labels, predicted_labels)
CFM = confusion_matrix(train_labels, predicted_labels)

print("\n", "Accuracy: " + str(format(acc_score,'.3f')))
print("\n", "CFM: \n", confusion_matrix(train_labels, predicted_labels))
print("\n", "Classification report: \n", classification_report(train_labels, predicted_labels))

I had some trouble getting the labels for training_data and testing_data , they seemed to be in a different order than the images, when I just used training_data.labels , that is why I looped over the batches to append the labels.我在获取training_data和testing_data的标签时遇到了一些麻烦，它们的顺序似乎与图像不同，当我刚刚使用training_data.labels时，这就是为什么我将批次循环到 append 标签。 When I just use training_data.labels , the result is equally bad.当我只使用training_data.labels时，结果同样糟糕。 The output from that code is:该代码中的 output 是：

4/4 [==============================] - 0s 71ms/step
[0. 2. 2. 0. 0. 1. 2. 2. 2. 1. 0. 0. 0. 1. 2. 0. 2. 0. 0. 1. 1. 1. 0. 0.
 0. 2. 2. 0. 1. 2. 0. 2. 1. 1. 2. 2. 0. 1. 0. 2. 0. 1. 1. 0. 2. 2. 0. 2.
 2. 2. 1. 2. 1. 0. 2. 2. 1. 2. 1. 0. 1. 2. 0. 1. 1. 1. 1. 2. 0. 0. 1. 1.
 1. 1. 1. 1. 2. 0. 0. 2. 2. 0. 1. 1. 1. 0. 2. 1. 2. 1. 2. 1. 1. 2. 0. 2.
 2. 0. 0. 2. 1. 0. 2. 0. 0. 1. 1. 2. 0. 0. 1. 1. 0. 0. 1. 2. 0. 2. 0. 2.]
[2 2 2 0 1 1 0 1 1 0 0 2 0 2 0 0 1 2 2 2 2 0 0 2 1 0 2 2 1 1 0 2 1 1 0 0 1
 0 1 0 2 2 2 1 1 1 0 2 0 1 0 0 2 0 0 0 2 0 1 2 2 1 0 2 2 0 1 0 2 2 0 2 0 0
 1 1 2 2 2 0 2 2 1 0 2 1 2 1 0 1 2 2 0 2 0 2 0 0 1 1 1 1 2 2 0 0 1 1 1 2 0
 0 1 0 1 0 2 0 0 0]

 Accuracy: 0.333

 CFM: 
 [[14 10 16]
 [13 14 13]
 [18 10 12]]

 Classification report: 
               precision    recall  f1-score   support

         0.0       0.31      0.35      0.33        40
         1.0       0.41      0.35      0.38        40
         2.0       0.29      0.30      0.30        40

    accuracy                           0.33       120
   macro avg       0.34      0.33      0.33       120
weighted avg       0.34      0.33      0.33       120

The accuracy for training and validation data during the training is very high, but when testing it, using the same data as for the training, the accuracy is only 33.3%.训练时训练和验证数据的准确率非常高，但测试时，使用与训练相同的数据，准确率只有 33.3%。

I assume, that the problem here is, that the class labels get mixed up somewhere, but I am at loss, how to fix it.我认为，这里的问题是 class 标签在某处混淆了，但我不知所措，如何解决它。 The data set itself is very simple, building the same CNN in Matlab, I get 100% accuarcy for both training and testing data, but I cannot make it run in Python.数据集本身很简单，在 Matlab 构建相同的 CNN，我得到训练和测试数据的 100% 准确率，但我不能让它在 Python 运行。

Does anyone have suggestions, how to get it running in Python?有没有人有建议，如何让它在 Python 中运行？

Answer 1

You are getting inconsistent results because your training image generator has shuffling enabled .您得到的结果不一致，因为您的训练图像生成器启用了改组。 This means that every time you reset your generator, the order of the images changes.这意味着每次重置生成器时，图像的顺序都会改变。 This is why when you use your image generator and do a one-sweep predict versus resetting it again and iterating through each image individually, you will not match the exact order.这就是为什么当您使用图像生成器并进行一次扫描预测而不是再次重置它并单独遍历每个图像时，您将不会匹配确切的顺序。 Shuffling is recommended if you are using the generator to train on the data so that the.network doesn't just memorize the data coming in.如果您使用生成器对数据进行训练，则建议使用混洗，这样 .network 不仅会记住传入的数据。

However, because you are now using this for evaluation purposes, you can disable this to ensure consistency in comparison.但是，因为您现在将其用于评估目的，您可以禁用它以确保比较的一致性。 Therefore, if you want this to be reproducible, set the shuffle flag to False .因此，如果您希望它可重现，请将shuffle标志设置为False 。 You can do this by just creating another image generator and iterate through that:您可以通过创建另一个图像生成器并遍历它来做到这一点：

training_data_noshuffle = data.flow_from_directory('/content/drive/My Drive/Data/Images', 
                                         target_size=(200, 200), shuffle=False, batch_size = batch_size, 
                                         class_mode='categorical', subset='training')
training_data_noshuffle.reset()

predicted_scores = Classifier.predict(training_data_noshuffle, verbose=1)
predicted_labels = predicted_scores.argmax(axis=1) 

train_labels = []
training_data_noshuffle.reset()

for i in range(0,numBatchTrain):
    train_labels =  np.append(train_labels, (training_data_noshuffle[i][1]).argmax(axis = 1))

Once you do this, you should see that your labels when you use predict vs. looping will now be consistent with respect to order.执行此操作后，您应该会看到使用predict与循环时的标签现在在顺序方面是一致的。

CNN with Keras，训练期间acc高但测试相同数据集时acc低

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-08-04 14:36:36

CNN with Keras，训练期间acc高但测试相同数据集时acc低

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-08-04 14:36:36

解决方案1
0 已采纳 2021-08-04 14:36:36