.evaluate() 和 sklearn classification_report() 之間的損失和准確率差異

Question

在 tensorflow 中訓練 model 時， .evaluate()指標和 sklearn classification_report報告之間存在明顯差異。 在訓練 model 時，歷史記錄顯示出良好的准確度，使用.evaluate()時大致相同，但使用 sklearn 指標時完全不同。

import tensorflow as tf
import tensorflow_datasets as tfds
from sklearn.metrics import classification_report

(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)

ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128,activation='relu'),
  tf.keras.layers.Dense(10)
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics='accuracy',
)

model.fit(
    ds_train,
    epochs=6,
    validation_data=ds_test,
)

Epoch 1/6
469/469 [==============================] - 1s 3ms/step - loss: 0.3586 - accuracy: 0.9009 - val_loss: 0.1961 - val_accuracy: 0.9435
Epoch 2/6
469/469 [==============================] - 1s 2ms/step - loss: 0.1634 - accuracy: 0.9529 - val_loss: 0.1310 - val_accuracy: 0.9619
Epoch 3/6
469/469 [==============================] - 1s 2ms/step - loss: 0.1142 - accuracy: 0.9676 - val_loss: 0.1089 - val_accuracy: 0.9670
Epoch 4/6
469/469 [==============================] - 1s 2ms/step - loss: 0.0883 - accuracy: 0.9743 - val_loss: 0.0913 - val_accuracy: 0.9721
Epoch 5/6
469/469 [==============================] - 1s 2ms/step - loss: 0.0709 - accuracy: 0.9795 - val_loss: 0.0795 - val_accuracy: 0.9772
Epoch 6/6
469/469 [==============================] - 1s 2ms/step - loss: 0.0590 - accuracy: 0.9826 - val_loss: 0.0762 - val_accuracy: 0.9768
<tensorflow.python.keras.callbacks.History at 0x1a603d02070>

loss, accuracy = model.evaluate(ds_train)
print("Loss:", loss)
print("Accuracy:", accuracy)

469/469 [==============================] - 1s 1ms/step - loss: 0.0484 - accuracy: 0.9867
Loss: 0.04843668267130852
Accuracy: 0.9867166876792908

train_probs = model.predict(ds_train)

train_preds = tf.argmax(train_probs, axis=-1)
train_labels_ds = ds_train.map(lambda image, label: label).unbatch()
y_true = next(iter(train_labels_ds.batch(60000))).numpy()

print(classification_report(y_true, train_preds))

 precision    recall  f1-score   support

           0       0.10      0.10      0.10      5923
           1       0.11      0.11      0.11      6742
           2       0.10      0.10      0.10      5958
           3       0.10      0.10      0.10      6131
           4       0.09      0.09      0.09      5842
           5       0.09      0.09      0.09      5421
           6       0.10      0.10      0.10      5918
           7       0.11      0.11      0.11      6265
           8       0.11      0.10      0.10      5851
           9       0.11      0.10      0.11      5949

    accuracy                           0.10     60000
   macro avg       0.10      0.10      0.10     60000
weighted avg       0.10      0.10      0.10     60000

如代碼所示，差異顯然很大，但似乎不知道問題所在。 我還嘗試使用 keras 中內置的指標，得到與 sklearn 相同的結果。

注：此代碼來自tensorflow官方文檔教程。

Answer 1

嘗試將此行更改為：

ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples, reshuffle_each_iteration=False)

默認情況下， reshuffle_each_iteration設置為True 。 因此，即使 model 訓練正確，這也會導致 label 和預測不匹配。 從文檔

reshuffle_each_iteration = A boolean，如果為真，則表示每次迭代數據集時都應偽隨機重新洗牌。 （默認為真。）

編輯 - 另一種方法：迭代數據集以獲取預測和標簽：

train_preds = np.array([])
y_true =  np.array([])

for x, y in ds_train:
  train_preds = np.concatenate([train_preds,
                       np.argmax(model(x), axis = -1)])
  y_true = np.concatenate([y_true, y.numpy()])

.evaluate() 和 sklearn classification_report() 之間的損失和准確率差異

問題描述

1 個解決方案

解決方案1
1 已采納 2021-03-26 17:20:02

.evaluate() 和 sklearn classification_report() 之間的損失和准確率差異

問題描述

1 個解決方案

解決方案1 1 已采納 2021-03-26 17:20:02

解決方案1
1 已采納 2021-03-26 17:20:02