简体   繁体   中英

Why accuracy manually calculated by model.predict() is different from model.evaluate()'s accuracy

The accuracy calculated using the predicted labels and the true labels is particularly low, is there something I've written wrong?

the output is

Found 70 files belonging to 10 classes.
3/3 [==============================] - 0s 5ms/step - loss: 2.2923 - accuracy: 0.6714
Test accuracy: 0.6714285612106323
3/3 [==============================] - 0s 7ms/step
accuracy: 0.08571428571428572

my code is:

# Evaluate the CNN model by classes
test_data = tf.keras.utils.image_dataset_from_directory(
    data_dir + '/test',
    image_size=(32, 32),
    batch_size=train_batch_size)

test_loss, test_acc = modified.evaluate(test_data)
print('Test accuracy:', test_acc)

# Get the predicted labels
predicted_labels = modified.predict(test_data)
predicted_labels = np.argmax(predicted_labels, axis=1)

# Get the true labels
true_labels = []
for images, labels in test_data:
    true_labels.extend(labels.numpy())

# Calculat the accuracy
acc_count = 0
for i in range(len(true_labels)):
    if true_labels[i] == predicted_labels[i]:
        acc_count += 1
print('Test accuracy:', acc_count/len(true_labels))

Because your data will be randomly shuffled two times.

  1. The first one is when you call modified.predict(test_data) . The test_data iterator will randomly shuffle the data.

  2. When you manually iterate over your dataset: for images, labels in test_data The data will also be randomly shuffled. So the labels order will different from the one you got with model.predict() .

It is a default behavior of the dataset created with image_dataset_from_directory function. You can check the documentation and default values of different arguments here:

https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory

If you set the shuffle=False :

test_data = tf.keras.utils.image_dataset_from_directory(
    data_dir + '/test',
    image_size=(32, 32),
    batch_size=train_batch_size,
    shuffle=False,
)

the manually calculated results and model.predict() ed ones will be identical (at least in the case of accuracy metric because evaluate aggregate the metrics over batches that could lead to the differences in the case of some other metrics).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM