为什么 fit() 期间的训练集准确度与在相同数据上使用预测后立即计算的准确度不同？

Question

在 Tensorflow - Keras 中写了一个基本的深度学习 model。

为什么训练结束时报告的训练集准确度 (0.4097) 与使用预测 function （或使用评估，给出相同数字）= 0.6463 直接计算相同训练数据后直接报告的结果不同？

下面是 MWE； output 后直接。

from extra_keras_datasets import kmnist
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
import numpy as np


# Model configuration
no_classes = 10


# Load KMNIST dataset
(input_train, target_train), (input_test, target_test) = kmnist.load_data(type='kmnist')

# Shape of the input sets
input_train_shape = input_train.shape
input_test_shape = input_test.shape 

# Keras layer input shape
input_shape = (input_train_shape[1], input_train_shape[2], 1)



# Reshape the training data to include channels
input_train = input_train.reshape(input_train_shape[0], input_train_shape[1], input_train_shape[2], 1)
input_test = input_test.reshape(input_test_shape[0], input_test_shape[1], input_test_shape[2], 1)


# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')

# Normalize input data
input_train = input_train / 255
input_test = input_test / 255


# Create the model
model = Sequential()
model.add(Flatten(input_shape=input_shape))
model.add(Dense(no_classes, activation='softmax'))


# Compile the model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
              optimizer=tensorflow.keras.optimizers.Adam(),
              metrics=['accuracy'])


# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=2000,
            epochs=1,
            verbose=1)

prediction = model.predict(input_train)
print("Prediction accuracy = ", np.mean( np.argmax(prediction, axis=1) == target_train))

model.evaluate(input_train, target_train, verbose=2)

output的最后几行：

30/30 [==============================] - 0s 3ms/step - loss: 1.8336 - accuracy: 0.4097
Prediction accuracy =  0.6463166666666667
1875/1875 - 1s - loss: 1.3406 - accuracy: 0.6463

编辑.

下面的初步答案解决了我的第一个问题，指出当您只运行 1 个 epoch 时，批量大小很重要。 当运行小批量（或批量 = 1）或更多 epoch 时，您可以将拟合后的预测准确度推到非常接近拟合本身抛出的最终准确度。 哪个好！

我最初问这个问题是因为我在使用更复杂的 model 时遇到了问题。

我仍然无法理解在这种情况下发生了什么（是的，它涉及批量标准化）。 要获得我的 MWE，请将上面“创建模型”下面的所有内容替换为下面的代码，以实现一些具有批量标准化的全连接层。

当您运行两个 epoch 时 - 您会看到所有 30 个小批次的准确度非常稳定（30 因为训练集中的 60,000 除以每批次的 2000）。 在整个第二个训练阶段，我看到非常一致的 83% 准确率。

但是拟合后的预测是一个糟糕的 10% 左右。 谁能解释一下？

model = Sequential()
model.add(Dense(50, activation='relu', input_shape = input_shape))
model.add(BatchNormalization())
model.add(Dense(20, activation='relu'))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(no_classes, activation='softmax'))


# Compile the model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
              optimizer=tensorflow.keras.optimizers.Adam(),
              metrics=['accuracy'])


# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=2000,
            epochs=2,
            verbose=1)

prediction = model.predict(input_train)

print("Prediction accuracy = ", np.mean( np.argmax(prediction, axis=1) == target_train))

model.evaluate(input_train, target_train, verbose=2, batch_size = batch_size)

30/30 [==============================] - 46s 2s/step - loss: 0.5567 - accuracy: 0.8345
Prediction accuracy =  0.10098333333333333

Answer 1

发生这种情况的一个原因是，报告的最后一个准确度考虑了整个时期，其参数不是恒定的，并且仍在优化中。

在评估 model 时，参数停止更改，并且它们保持在最终（希望是最优化的）state 中。 与上一个时期不同，在上一个时期，参数处于各种（希望是优化程度较低的）状态，在时期开始时更是如此。

删除是因为我现在看到您在这种情况下没有使用批处理规范。

我假设这是由于BatchNormalization 。

参见例如here

在训练期间，使用移动平均线。

在推理过程中，我们已经有了归一化参数

这很可能是造成差异的原因。

请尝试不使用它，看看是否仍然存在如此巨大的差异。

Answer 2

只需添加到@Gulzar 答案：这种效果可以非常明显，因为 OP 只使用了一个时期（很多参数在训练的一开始就发生了变化），批量大小在评估方法（默认为 32）和拟合方法中不相等，批量大小远小于整个数据（意味着在每个时期进行大量更新）。

只需在同一个实验中添加更多的 epoch 就会减弱这种效果。

# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=2000,
            epochs=40,
            verbose=1)

结果

Epoch 40/40
30/30 [==============================] - 0s 11ms/step - loss: 0.5663 - accuracy: 0.8339
Prediction accuracy =  0.8348
1875/1875 - 2s - loss: 0.5643 - accuracy: 0.8348 - 2s/epoch - 1ms/step
[0.5643048882484436, 0.8348000049591064]

为什么 fit() 期间的训练集准确度与在相同数据上使用预测后立即计算的准确度不同？

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-01-20 13:01:49

解决方案2
1 2022-01-20 15:15:29

为什么 fit() 期间的训练集准确度与在相同数据上使用预测后立即计算的准确度不同？

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-01-20 13:01:49

解决方案2 1 2022-01-20 15:15:29

解决方案1
1 已采纳 2022-01-20 13:01:49

解决方案2
1 2022-01-20 15:15:29