Tensorflow model.evaluate gives different result from that obtained from training

Question

I am using tensorflow to do a multi-class classification

I load the training dataset and validation dataset in the following way

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  shuffle=True,
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  shuffle=True,
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Then when I train the model using model.fit()

history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs,
  shuffle=True
)

I get validation accuracy around 95%.

But when I load the same validation set and use model.evaluate()

model.evaluate(val_ds)

I get very low accuracy (around 10%).

Why am I getting such different results? Am I using the model.evaluate function incorrectly?

Note : In the model.compile() I am specifying the following, Optimizer - Adam, Loss - SparseCategoricalCrossentropy, Metric - Accuracy

Model.evaluate() output

41/41 [==============================] - 5s 118ms/step - loss: 0.3037 - accuracy: 0.1032
Test Loss -  0.3036555051803589
Test Acc -  0.10315627604722977

Model.fit() output for last three epochs

Epoch 8/10
41/41 [==============================] - 3s 80ms/step - loss: 0.6094 - accuracy: 0.8861 - val_loss: 0.4489 - val_accuracy: 0.9483
Epoch 9/10
41/41 [==============================] - 3s 80ms/step - loss: 0.5377 - accuracy: 0.8953 - val_loss: 0.3868 - val_accuracy: 0.9554
Epoch 10/10
41/41 [==============================] - 3s 80ms/step - loss: 0.4663 - accuracy: 0.9092 - val_loss: 0.3404 - val_accuracy: 0.9590

Answer 1

I am using tensorflow to do a multi-class classification

I load the training dataset and validation dataset in the following way

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  shuffle=True,
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  shuffle=True,
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Then when I train the model using model.fit()

history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs,
  shuffle=True
)

I get validation accuracy around 95%.

But when I load the same validation set and use model.evaluate()

model.evaluate(val_ds)

I get very low accuracy (around 10%).

Why am I getting such different results? Am I using the model.evaluate function incorrectly?

Note : In the model.compile() I am specifying the following, Optimizer - Adam, Loss - SparseCategoricalCrossentropy, Metric - Accuracy

Model.evaluate() output

41/41 [==============================] - 5s 118ms/step - loss: 0.3037 - accuracy: 0.1032
Test Loss -  0.3036555051803589
Test Acc -  0.10315627604722977

Model.fit() output for last three epochs

Epoch 8/10
41/41 [==============================] - 3s 80ms/step - loss: 0.6094 - accuracy: 0.8861 - val_loss: 0.4489 - val_accuracy: 0.9483
Epoch 9/10
41/41 [==============================] - 3s 80ms/step - loss: 0.5377 - accuracy: 0.8953 - val_loss: 0.3868 - val_accuracy: 0.9554
Epoch 10/10
41/41 [==============================] - 3s 80ms/step - loss: 0.4663 - accuracy: 0.9092 - val_loss: 0.3404 - val_accuracy: 0.9590

Answer 2

Why am I getting such different results? Am I using the model.evaluate function incorrectly?

I suppose that it is the over fitting that cause this issue. You can check them out in this way!

Extract the history of model

history_dict = history.history history_dict.keys()

Visualize the history

import matplotlib.pyplot as plt acc=history_dict['accuracy'] val_acc=history_dict['val_accuracy'] loss=history_dict['loss'] val_loss=history_dict['val_loss'] epochs=range(1,len(acc)+1) plt.figure(figsize=(10,10)) ax1=plt.subplot(221) ax1.plot(epochs,loss,'bo',label='Training loss') ax1.plot(epochs,acc,'ro',label='Training acc') ax1.set_title('loss and acc of Training') ax1.set_xlabel('Epochs') ax1.set_ylabel('Loss') ax1.legend() ax2=plt.subplot(222) ax2.plot(epochs,val_acc,'r',label='Validation acc') ax2.plot(epochs,val_loss,'b',label='Validation loss') ax2.set_title('loss and acc of Training') ax2.set_xlabel('Epochs') ax2.set_ylabel('Acc') ax2.legend()

Maybe, the results you get are like these:

In training process, acc and loss changed with epochs
But in validation, acc and loss seem to reached a peak after 20 epochs

Solution

It turns out that, when overfitting occurs, fewer epochs can be set to avoid this problem!

Tensorflow model.evaluate gives different result from that obtained from training

Question

2 answers

solution1
0 2021-01-05 05:23:23

solution2
0 2021-09-20 02:49:26

Tensorflow model.evaluate gives different result from that obtained from training

Question

2 answers

solution1 0 2021-01-05 05:23:23

solution2 0 2021-09-20 02:49:26

solution1
0 2021-01-05 05:23:23

solution2
0 2021-09-20 02:49:26