Keras model returns high validation accuracy while training, but accuracy is very low while evaluating

Question

I am trying to train a simple MobileNetV3Small under keras.applications as shown below

base_model = keras.applications.MobileNetV3Small(
        input_shape= INPUT_SHAPE,
        alpha=.125,
        include_top=False,
        classes=1,
        dropout_rate = 0.2,
        weights=None)

    x = keras.layers.Flatten()(base_model.output)
    preds = keras.layers.Dense(1, activation="sigmoid")(x)
    model = keras.Model(inputs=base_model.input, outputs=preds)

 model.compile(loss="binary_crossentropy",
                optimizer='RMSprop',
                metrics=["binary_accuracy"])

train_datagen = ImageDataGenerator(
        rescale=1.0 / 255,
        rotation_range=40,
        horizontal_flip=True,
        vertical_flip=True,
    )

    train_generator = train_datagen.flow_from_directory(
        os.path.join(DATA_ROOT, 'train'),
        target_size=(56,56),
        batch_size=128,
        class_mode="binary",
    )


    validation_datagen = ImageDataGenerator(rescale=1.0 / 255)
    validation_generator = validation_datagen.flow_from_directory(
        os.path.join(DATA_ROOT, 'val'),
        target_size=(56,56),
        batch_size=128,
        class_mode="binary",
    )

    model_checkpoint_callback = keras.callbacks.ModelCheckpoint(
        filepath=SAVE_DIR,
        save_weights_only=True,
        monitor='val_binary_accuracy',
        mode='max',
        save_best_only=True)

    es_callback = keras.callbacks.EarlyStopping(patience=10)

    model.fit(train_generator,
                epochs=100,
                validation_data=validation_generator,
                callbacks=[model_checkpoint_callback, es_callback],
                shuffle=True)

When I train the model I got validation accuracy around 0.94. But when I call model.evaluate on the exact same validation data, the accuracy becomes 0.48. When I call model.predict with any data it outputs constant value 0.51...

There is nothing wrong with learning rate, optimizer or metrics. What could be wrong here?

EDIT:

After training when I run

pred_results = model.evaluate(validation_generator)
print(pred_results)

it gives me the output for 1 epoch trained network:

6/6 [==============================] - 1s 100ms/step - loss: 0.6935 - binary_accuracy: 0.8461

However, when I save and load the model with either model.save() or tf.keras.models.save_model() . The output becomes something like this:

6/6 [==============================] - 2s 100ms/step - loss: 0.6935 - binary_accuracy: 0.5028 [0.6935192346572876, 0.5027709603309631]

and output of the model.predict(validation_generator) is:

[[0.5080832] [0.5080832] [0.5080832] [0.5080832] . . . [0.5080832] [0.5080832]]

What I've tried so far:

Used tf.keras.utils.image_dataset_from_directory() instead of ImageDataGenerator
Fixed tensorflow and numpy seeds globally.
Found similar problem in another SO post , and decreased momentum parameter of MobileNet BatchNormalization layers one by one.

for layer in model.layers[0].layers:
    if type(layer) is tf.keras.layers.BatchNormalization:
        layer.momentum = 0.9

First two moves do not have an effect, the after applying the third step, I get no longer same predictions for any input. However, evaluate() and predict() still have different accuracy values.

Answer 1

It might be worth trying model.save_weights('directory') and then rebuilding your model (i think here that is re-running the base_model = ... code) through model.load_weights('directory') . That is what i do in my own models, and when i then do that, the accuracy/loss stay the exact same before and after saving and loading.

Answer 2

If you run pred_results = model.evaluate(validation_generator) after you fit the model, the loaded weights at this moment are the ones updated on last training epoch. What you have to do is after model.fit is loading the weights saved from model_checkpoint_callback with something like

model = model.load_weights(SAVE_DIR)` # and then .evaluate
pred_results = model.evaluate(validation_generator)
print(pred_results)

Answer 3

Have you tried setting shuffle = False in validation_datagen.flow_from_directory() ? It's a little misleading but the .flow_from_directory() method shuffles by default, which is problematic when generating your validation dataset. This is shuffling your validation data when you try to call .predict . Whereas in your training loop, the .fit method implicitly DOESN'T shuffle the validation set.

The reason I think this is the issue, is because you state that calling .predict() on the validation set nets you ~.5 accuracy, and you're also running a binary classification (sigmoid output with binary cross entropy loss), which makes perfect sense IF you're (mistakenly) shuffling your validation data. Untrained binary classifiers on balanced datasets will usually do around 50% accuracy (.5 for 0, .5 for 1) since it's just guessing at that point.

Source : I've built and trained a lot of image classification models before, and this happened to me a lot.

Keras model returns high validation accuracy while training, but accuracy is very low while evaluating

Question

3 answers

solution1
0 2022-07-06 10:55:25

solution2
0 2022-07-09 19:51:29

solution3
0 2022-07-12 17:36:25

Keras model returns high validation accuracy while training, but accuracy is very low while evaluating

Question

3 answers

solution1 0 2022-07-06 10:55:25

solution2 0 2022-07-09 19:51:29

solution3 0 2022-07-12 17:36:25

solution1
0 2022-07-06 10:55:25

solution2
0 2022-07-09 19:51:29

solution3
0 2022-07-12 17:36:25