简体   繁体   中英

Tensorflow model.evaluate gives different result from that obtained from training

I am using tensorflow to do a multi-class classification

I load the training dataset and validation dataset in the following way

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  shuffle=True,
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  shuffle=True,
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Then when I train the model using model.fit()

history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs,
  shuffle=True
)

I get validation accuracy around 95%.

But when I load the same validation set and use model.evaluate()

model.evaluate(val_ds)

I get very low accuracy (around 10%).

Why am I getting such different results? Am I using the model.evaluate function incorrectly?

Note : In the model.compile() I am specifying the following, Optimizer - Adam, Loss - SparseCategoricalCrossentropy, Metric - Accuracy

Model.evaluate() output

41/41 [==============================] - 5s 118ms/step - loss: 0.3037 - accuracy: 0.1032
Test Loss -  0.3036555051803589
Test Acc -  0.10315627604722977

Model.fit() output for last three epochs

Epoch 8/10
41/41 [==============================] - 3s 80ms/step - loss: 0.6094 - accuracy: 0.8861 - val_loss: 0.4489 - val_accuracy: 0.9483
Epoch 9/10
41/41 [==============================] - 3s 80ms/step - loss: 0.5377 - accuracy: 0.8953 - val_loss: 0.3868 - val_accuracy: 0.9554
Epoch 10/10
41/41 [==============================] - 3s 80ms/step - loss: 0.4663 - accuracy: 0.9092 - val_loss: 0.3404 - val_accuracy: 0.9590

I am using tensorflow to do a multi-class classification

I load the training dataset and validation dataset in the following way

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  shuffle=True,
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  shuffle=True,
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Then when I train the model using model.fit()

history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs,
  shuffle=True
)

I get validation accuracy around 95%.

But when I load the same validation set and use model.evaluate()

model.evaluate(val_ds)

I get very low accuracy (around 10%).

Why am I getting such different results? Am I using the model.evaluate function incorrectly?

Note : In the model.compile() I am specifying the following, Optimizer - Adam, Loss - SparseCategoricalCrossentropy, Metric - Accuracy

Model.evaluate() output

41/41 [==============================] - 5s 118ms/step - loss: 0.3037 - accuracy: 0.1032
Test Loss -  0.3036555051803589
Test Acc -  0.10315627604722977

Model.fit() output for last three epochs

Epoch 8/10
41/41 [==============================] - 3s 80ms/step - loss: 0.6094 - accuracy: 0.8861 - val_loss: 0.4489 - val_accuracy: 0.9483
Epoch 9/10
41/41 [==============================] - 3s 80ms/step - loss: 0.5377 - accuracy: 0.8953 - val_loss: 0.3868 - val_accuracy: 0.9554
Epoch 10/10
41/41 [==============================] - 3s 80ms/step - loss: 0.4663 - accuracy: 0.9092 - val_loss: 0.3404 - val_accuracy: 0.9590

Why am I getting such different results? Am I using the model.evaluate function incorrectly?

I suppose that it is the over fitting that cause this issue. You can check them out in this way!


  1. Extract the history of model

    history_dict = history.history history_dict.keys()
  2. Visualize the history

    import matplotlib.pyplot as plt acc=history_dict['accuracy'] val_acc=history_dict['val_accuracy'] loss=history_dict['loss'] val_loss=history_dict['val_loss'] epochs=range(1,len(acc)+1) plt.figure(figsize=(10,10)) ax1=plt.subplot(221) ax1.plot(epochs,loss,'bo',label='Training loss') ax1.plot(epochs,acc,'ro',label='Training acc') ax1.set_title('loss and acc of Training') ax1.set_xlabel('Epochs') ax1.set_ylabel('Loss') ax1.legend() ax2=plt.subplot(222) ax2.plot(epochs,val_acc,'r',label='Validation acc') ax2.plot(epochs,val_loss,'b',label='Validation loss') ax2.set_title('loss and acc of Training') ax2.set_xlabel('Epochs') ax2.set_ylabel('Acc') ax2.legend()

Maybe, the results you get are like these:

  • In training process, acc and loss changed with epochs
  • But in validation, acc and loss seem to reached a peak after 20 epochs

Solution

It turns out that, when overfitting occurs, fewer epochs can be set to avoid this problem!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM