为什么 fit() 期间的训练集准确度与在相同数据上使用预测后立即计算的准确度不同？

Question

Have written a basic deep learning model in Tensorflow - Keras.在 Tensorflow - Keras 中写了一个基本的深度学习 model。

Why is the training-set accuracy as reported at the end of training (0.4097) differs to that reported directly afterwards with a direct calculation on the same training data using the predict function (or using evaluate, which gives the same number) = 0.6463?为什么训练结束时报告的训练集准确度 (0.4097) 与使用预测 function （或使用评估，给出相同数字）= 0.6463 直接计算相同训练数据后直接报告的结果不同？

MWE below;下面是 MWE； output directly after. output 后直接。

from extra_keras_datasets import kmnist
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
import numpy as np


# Model configuration
no_classes = 10


# Load KMNIST dataset
(input_train, target_train), (input_test, target_test) = kmnist.load_data(type='kmnist')

# Shape of the input sets
input_train_shape = input_train.shape
input_test_shape = input_test.shape 

# Keras layer input shape
input_shape = (input_train_shape[1], input_train_shape[2], 1)



# Reshape the training data to include channels
input_train = input_train.reshape(input_train_shape[0], input_train_shape[1], input_train_shape[2], 1)
input_test = input_test.reshape(input_test_shape[0], input_test_shape[1], input_test_shape[2], 1)


# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')

# Normalize input data
input_train = input_train / 255
input_test = input_test / 255


# Create the model
model = Sequential()
model.add(Flatten(input_shape=input_shape))
model.add(Dense(no_classes, activation='softmax'))


# Compile the model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
              optimizer=tensorflow.keras.optimizers.Adam(),
              metrics=['accuracy'])


# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=2000,
            epochs=1,
            verbose=1)

prediction = model.predict(input_train)
print("Prediction accuracy = ", np.mean( np.argmax(prediction, axis=1) == target_train))

model.evaluate(input_train, target_train, verbose=2)

Last couple of lines of output: output的最后几行：

30/30 [==============================] - 0s 3ms/step - loss: 1.8336 - accuracy: 0.4097
Prediction accuracy =  0.6463166666666667
1875/1875 - 1s - loss: 1.3406 - accuracy: 0.6463

Edit .编辑.

The initial answers below have solved my first problem by pointing out that the batch size matters when you only run 1 epoch.下面的初步答案解决了我的第一个问题，指出当您只运行 1 个 epoch 时，批量大小很重要。 When running small batch sizes (or batch size = 1), or more epochs, you can push the post-fitting prediction accuracy pretty close to the final accuracy thrown out in fitting itself.当运行小批量（或批量 = 1）或更多 epoch 时，您可以将拟合后的预测准确度推到非常接近拟合本身抛出的最终准确度。 Which is good!哪个好！

I originally asked this question because I was having trouble with a more complex model.我最初问这个问题是因为我在使用更复杂的 model 时遇到了问题。

I'm still have trouble with understanding what's happening in this case (and yes, it involves batch normalisation).我仍然无法理解在这种情况下发生了什么（是的，它涉及批量标准化）。 To get my MWE, replace everything below 'create the model' above with the code below to implement a few fully connected layers with batch normalisation.要获得我的 MWE，请将上面“创建模型”下面的所有内容替换为下面的代码，以实现一些具有批量标准化的全连接层。

When you run two epochs of this - you'll see really stable accuracies from all 30 mini-batches (30 because 60,000 in training set divided by 2000 in each batch).当您运行两个 epoch 时 - 您会看到所有 30 个小批次的准确度非常稳定（30 因为训练集中的 60,000 除以每批次的 2000）。 I see very consistently 83% accuracy across the whole second epoch of training.在整个第二个训练阶段，我看到非常一致的 83% 准确率。

But the prediction after fitting is an abysmal 10% or thereabouts after doing this.但是拟合后的预测是一个糟糕的 10% 左右。 Can anyone explain this?谁能解释一下？

model = Sequential()
model.add(Dense(50, activation='relu', input_shape = input_shape))
model.add(BatchNormalization())
model.add(Dense(20, activation='relu'))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(no_classes, activation='softmax'))


# Compile the model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
              optimizer=tensorflow.keras.optimizers.Adam(),
              metrics=['accuracy'])


# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=2000,
            epochs=2,
            verbose=1)

prediction = model.predict(input_train)

print("Prediction accuracy = ", np.mean( np.argmax(prediction, axis=1) == target_train))

model.evaluate(input_train, target_train, verbose=2, batch_size = batch_size)

30/30 [==============================] - 46s 2s/step - loss: 0.5567 - accuracy: 0.8345
Prediction accuracy =  0.10098333333333333

Answer 1

One reason this can happen, is the last accuracy reported takes into account the entire epoch, with its parameters non constant, and still being optimized.发生这种情况的一个原因是，报告的最后一个准确度考虑了整个时期，其参数不是恒定的，并且仍在优化中。

When evaluating the model, the parameters stop changing, and they remain in their final (hopefully, most optimized) state.在评估 model 时，参数停止更改，并且它们保持在最终（希望是最优化的）state 中。 Unlike during the last epoch, for which the parameters were in all kinds of (hopefully, less optimized) states, more so at the start of the epoch.与上一个时期不同，在上一个时期，参数处于各种（希望是优化程度较低的）状态，在时期开始时更是如此。

Deleted because I now see you didn't use batch norm in this case.删除是因为我现在看到您在这种情况下没有使用批处理规范。

I am assuming this is due to BatchNormalization .我假设这是由于BatchNormalization 。

See for example here参见例如here

During training , a moving average is used.在训练期间，使用移动平均线。

During inference , we already have the normalization parameters在推理过程中，我们已经有了归一化参数

This is likely to be the cause of the difference.这很可能是造成差异的原因。

Please try without it, and see if still such drastic differences exist.请尝试不使用它，看看是否仍然存在如此巨大的差异。

Answer 2

Just adding to @Gulzar answer: this effect can be very clear because OP used only one epoch (a lot of parameters are changing in the very beginning of trainning), batch size not equal in evaluate method (default to 32) and fit method, batch size lot less than whole data (meaning a lot of updating during each epoch).只需添加到@Gulzar 答案：这种效果可以非常明显，因为 OP 只使用了一个时期（很多参数在训练的一开始就发生了变化），批量大小在评估方法（默认为 32）和拟合方法中不相等，批量大小远小于整个数据（意味着在每个时期进行大量更新）。

Just adding more epochs to same experiment would attenuate this effect.只需在同一个实验中添加更多的 epoch 就会减弱这种效果。

# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=2000,
            epochs=40,
            verbose=1)

Result结果

Epoch 40/40
30/30 [==============================] - 0s 11ms/step - loss: 0.5663 - accuracy: 0.8339
Prediction accuracy =  0.8348
1875/1875 - 2s - loss: 0.5643 - accuracy: 0.8348 - 2s/epoch - 1ms/step
[0.5643048882484436, 0.8348000049591064]

为什么 fit() 期间的训练集准确度与在相同数据上使用预测后立即计算的准确度不同？

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-01-20 13:01:49

解决方案2
1 2022-01-20 15:15:29

为什么 fit() 期间的训练集准确度与在相同数据上使用预测后立即计算的准确度不同？

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-01-20 13:01:49

解决方案2 1 2022-01-20 15:15:29

解决方案1
1 已采纳 2022-01-20 13:01:49

解决方案2
1 2022-01-20 15:15:29