CNN with Keras, high acc during training but low during testing for same data set

Question

I am using Google Colab to build a CNN using Keras. The data set contains 3 classes with the same number of images for each class. The images are in my Google Drive organized as

Images:
-- class 1
-- class 2
-- class 3

The code to read the data and create the CNN is here:

batch_size = 30

data = ImageDataGenerator(rescale=1. / 255, 
                          validation_split=0.2)

training_data = data.flow_from_directory('/content/drive/My Drive/Data/Images', 
                                         target_size=(200, 200), shuffle=True, batch_size = batch_size, 
                                         class_mode='categorical', subset='training')

test_data = data.flow_from_directory('/content/drive/My Drive/Data/Images', 
                                     target_size=(200, 200), batch_size = batch_size, shuffle=False,
                                     class_mode='categorical', subset='validation')

numBatchTest = ceil(len(test_data.filenames) / (1.0 * batch_size)) # 1.0 to avoid integer division
numBatchTrain = ceil(len(training_data.filenames) / (1.0 * batch_size)) # 1.0 to avoid integer division

numClasses = 3

Classifier=Sequential()
Classifier.add(Conv2D(32, kernel_size=(5, 5), input_shape=(200, 200, 3)))
Classifier.add(BatchNormalization())
Classifier.add(Activation('relu'))
Classifier.add(MaxPooling2D(pool_size=(2,2)))
Classifier.add(Dropout(0.2))
               
Classifier.add(Conv2D(64, kernel_size=(3, 3)))
Classifier.add(BatchNormalization())
Classifier.add(Activation('relu'))
Classifier.add(MaxPooling2D(pool_size=(2,2)))
Classifier.add(Dropout(0.2))

Classifier.add(Flatten())

Classifier.add(Dense(64, activation='relu'))
Classifier.add(Dense(32, activation='relu'))
Classifier.add(Dense(16, activation='relu'))
Classifier.add(Dense(8, activation='relu'))
Classifier.add(Dense(numClasses, activation='softmax'))

I train the.network and use the test data as verification:

MyEpochs = 150
Classifier.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.SGD(learning_rate=0.01), 
              metrics=['accuracy']) 

Classifier.fit(training_data,
                        batch_size = 30,
                        epochs = MyEpochs,
                        validation_data=test_data,
                        shuffle = 1)

The accuracy and vaildation accuracy are both above 90% for the training output:

Epoch 135/150
4/4 [==============================] - 0s 123ms/step - loss: 0.0759 - accuracy: 0.9750 - val_loss: 0.1891 - val_accuracy: 0.9667
Epoch 136/150
4/4 [==============================] - 0s 124ms/step - loss: 0.1153 - accuracy: 0.9583 - val_loss: 0.2348 - val_accuracy: 0.9333
Epoch 137/150
4/4 [==============================] - 1s 134ms/step - loss: 0.1059 - accuracy: 0.9417 - val_loss: 0.1893 - val_accuracy: 0.9667
Epoch 138/150
4/4 [==============================] - 0s 122ms/step - loss: 0.0689 - accuracy: 0.9833 - val_loss: 0.1991 - val_accuracy: 0.9667
Epoch 139/150
4/4 [==============================] - 1s 131ms/step - loss: 0.0716 - accuracy: 0.9750 - val_loss: 0.2175 - val_accuracy: 0.9333
Epoch 140/150
4/4 [==============================] - 0s 125ms/step - loss: 0.1118 - accuracy: 0.9417 - val_loss: 0.2466 - val_accuracy: 0.9333
Epoch 141/150
4/4 [==============================] - 1s 126ms/step - loss: 0.1046 - accuracy: 0.9417 - val_loss: 0.2351 - val_accuracy: 0.9333
Epoch 142/150
4/4 [==============================] - 0s 120ms/step - loss: 0.0988 - accuracy: 0.9417 - val_loss: 0.1994 - val_accuracy: 0.9333
Epoch 143/150
4/4 [==============================] - 0s 124ms/step - loss: 0.0803 - accuracy: 0.9500 - val_loss: 0.1910 - val_accuracy: 0.9667
Epoch 144/150
4/4 [==============================] - 0s 124ms/step - loss: 0.0786 - accuracy: 0.9750 - val_loss: 0.1908 - val_accuracy: 0.9667
Epoch 145/150
4/4 [==============================] - 0s 124ms/step - loss: 0.0947 - accuracy: 0.9500 - val_loss: 0.4854 - val_accuracy: 0.8667
Epoch 146/150
4/4 [==============================] - 1s 128ms/step - loss: 0.2091 - accuracy: 0.9000 - val_loss: 0.1858 - val_accuracy: 0.9333
Epoch 147/150
4/4 [==============================] - 0s 124ms/step - loss: 0.0838 - accuracy: 0.9417 - val_loss: 0.1779 - val_accuracy: 0.9667
Epoch 148/150
4/4 [==============================] - 1s 128ms/step - loss: 0.0771 - accuracy: 0.9667 - val_loss: 0.1897 - val_accuracy: 0.9667
Epoch 149/150
4/4 [==============================] - 0s 120ms/step - loss: 0.0869 - accuracy: 0.9667 - val_loss: 0.1982 - val_accuracy: 0.9667
Epoch 150/150
4/4 [==============================] - 0s 119ms/step - loss: 0.0809 - accuracy: 0.9500 - val_loss: 0.2615 - val_accuracy: 0.9333

To test the model, I predict the training data again:

training_data.reset()
test_data.reset()

predicted_scores = Classifier.predict(training_data, verbose=1)
predicted_labels = predicted_scores.argmax(axis=1) 

train_labels = []
training_data.reset()

for i in range(0,numBatchTrain):
    train_labels =  np.append(train_labels, (training_data[i][1]).argmax(axis = 1))
print(train_labels)
print(predicted_labels)

acc_score = accuracy_score(train_labels, predicted_labels)
CFM = confusion_matrix(train_labels, predicted_labels)

print("\n", "Accuracy: " + str(format(acc_score,'.3f')))
print("\n", "CFM: \n", confusion_matrix(train_labels, predicted_labels))
print("\n", "Classification report: \n", classification_report(train_labels, predicted_labels))

I had some trouble getting the labels for training_data and testing_data , they seemed to be in a different order than the images, when I just used training_data.labels , that is why I looped over the batches to append the labels. When I just use training_data.labels , the result is equally bad. The output from that code is:

4/4 [==============================] - 0s 71ms/step
[0. 2. 2. 0. 0. 1. 2. 2. 2. 1. 0. 0. 0. 1. 2. 0. 2. 0. 0. 1. 1. 1. 0. 0.
 0. 2. 2. 0. 1. 2. 0. 2. 1. 1. 2. 2. 0. 1. 0. 2. 0. 1. 1. 0. 2. 2. 0. 2.
 2. 2. 1. 2. 1. 0. 2. 2. 1. 2. 1. 0. 1. 2. 0. 1. 1. 1. 1. 2. 0. 0. 1. 1.
 1. 1. 1. 1. 2. 0. 0. 2. 2. 0. 1. 1. 1. 0. 2. 1. 2. 1. 2. 1. 1. 2. 0. 2.
 2. 0. 0. 2. 1. 0. 2. 0. 0. 1. 1. 2. 0. 0. 1. 1. 0. 0. 1. 2. 0. 2. 0. 2.]
[2 2 2 0 1 1 0 1 1 0 0 2 0 2 0 0 1 2 2 2 2 0 0 2 1 0 2 2 1 1 0 2 1 1 0 0 1
 0 1 0 2 2 2 1 1 1 0 2 0 1 0 0 2 0 0 0 2 0 1 2 2 1 0 2 2 0 1 0 2 2 0 2 0 0
 1 1 2 2 2 0 2 2 1 0 2 1 2 1 0 1 2 2 0 2 0 2 0 0 1 1 1 1 2 2 0 0 1 1 1 2 0
 0 1 0 1 0 2 0 0 0]

 Accuracy: 0.333

 CFM: 
 [[14 10 16]
 [13 14 13]
 [18 10 12]]

 Classification report: 
               precision    recall  f1-score   support

         0.0       0.31      0.35      0.33        40
         1.0       0.41      0.35      0.38        40
         2.0       0.29      0.30      0.30        40

    accuracy                           0.33       120
   macro avg       0.34      0.33      0.33       120
weighted avg       0.34      0.33      0.33       120

The accuracy for training and validation data during the training is very high, but when testing it, using the same data as for the training, the accuracy is only 33.3%.

I assume, that the problem here is, that the class labels get mixed up somewhere, but I am at loss, how to fix it. The data set itself is very simple, building the same CNN in Matlab, I get 100% accuarcy for both training and testing data, but I cannot make it run in Python.

Does anyone have suggestions, how to get it running in Python?

Answer 1

You are getting inconsistent results because your training image generator has shuffling enabled . This means that every time you reset your generator, the order of the images changes. This is why when you use your image generator and do a one-sweep predict versus resetting it again and iterating through each image individually, you will not match the exact order. Shuffling is recommended if you are using the generator to train on the data so that the.network doesn't just memorize the data coming in.

However, because you are now using this for evaluation purposes, you can disable this to ensure consistency in comparison. Therefore, if you want this to be reproducible, set the shuffle flag to False . You can do this by just creating another image generator and iterate through that:

training_data_noshuffle = data.flow_from_directory('/content/drive/My Drive/Data/Images', 
                                         target_size=(200, 200), shuffle=False, batch_size = batch_size, 
                                         class_mode='categorical', subset='training')
training_data_noshuffle.reset()

predicted_scores = Classifier.predict(training_data_noshuffle, verbose=1)
predicted_labels = predicted_scores.argmax(axis=1) 

train_labels = []
training_data_noshuffle.reset()

for i in range(0,numBatchTrain):
    train_labels =  np.append(train_labels, (training_data_noshuffle[i][1]).argmax(axis = 1))

Once you do this, you should see that your labels when you use predict vs. looping will now be consistent with respect to order.

CNN with Keras, high acc during training but low during testing for same data set

Question

1 answers

solution1
0 ACCPTED 2021-08-04 14:36:36

CNN with Keras, high acc during training but low during testing for same data set

Question

1 answers

solution1 0 ACCPTED 2021-08-04 14:36:36

solution1
0 ACCPTED 2021-08-04 14:36:36