Tensorflow 评估给出的误差大于训练的最后一个时期

Question

I have a TensorFlow regression model.我有一个 TensorFlow 回归 model。 I don't think the details of the model's layers are related to the question, so I'm skipping that.我不认为模型层的细节与问题有关，所以我跳过了。 I can add that if you think it would be useful.如果您认为它有用，我可以补充一点。

I compile with the following code.我用下面的代码编译。 Loss and metric are mean squared error.损失和度量是均方误差。

model.compile(
    loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(lr=0.001),
    metrics=['mse']
)

Now, I run the following code to train the network and evaluate it.现在，我运行以下代码来训练网络并对其进行评估。 I train it for 2 epochs, then I evaluate the model on the same data with evaluate method and I evaluate it by hand using predict method and MSE formula.我训练了 2 个 epoch，然后我使用evaluate方法在相同数据上评估 model，然后使用predict方法和 MSE 公式手动评估它。

print('fit')
model.fit(X, y, epochs=2, batch_size=32)

print('evaluate')
print(model.evaluate(X, y))

print('manual evaluate')
print(((y - model.predict(X).ravel()) ** 2).mean())

Here is the result:结果如下：

3152/3152 [==============================] - 12s 3ms/step - loss: 7.7276 - mse: 7.7275
Epoch 2/2
3152/3152 [==============================] - 11s 4ms/step - loss: 0.9898 - mse: 0.9894
evaluate
3152/3152 [==============================] - 2s 686us/step - loss: 1.3753 - mse: 1.3748
[1.3753225803375244, 1.3747814893722534]
manual evaluate
1.3747820755885116

I have a slight regularization so loss is a bit greater than mse as expected.我有一个轻微的正则化，所以损失比预期的 mse 大一点。

But, As you can see, MSE is 0.98 at the end of the last epoch.但是，如您所见，MSE 在最后一个时期结束时为 0.98。 However, I get 1.37 MSE when I evaluate it by evaluate method or when I actually calculate it manually.但是，当我通过evaluate方法评估它或实际手动计算它时，我得到 1.37 MSE。 The model uses the weights after the last epoch as far as I know so those two numbers should be equal, right?据我所知，model 使用最后一个时期之后的权重，所以这两个数字应该相等，对吧？ What I'm missing here?我在这里缺少什么？ I tried with different batch_size and epoch counts.我尝试了不同的 batch_size 和 epoch 计数。 Evaluated MSE is always higher than MSE at the last epoch of the fit method.在拟合方法的最后一个时期，评估的 MSE 总是高于 MSE。

Note: y is a one-dimensional NumPy array注： y是一维NumPy数组

y.shape
> (100836,)

Edit: I run the fit method with validation_data parameter using the same (X, y) as the validation data:编辑：我使用与验证数据相同的(X, y)运行带有validation_data参数的fit方法：

model.fit(X, y, epochs=2, batch_size=32, validation_data=(X, y))

Output: Output：

Epoch 1/2
3152/3152 [==============================] - 23s 7ms/step - loss: 7.9766 - mse: 7.9764 - val_loss: 2.0284 - val_mse: 2.0280
Epoch 2/2
3152/3152 [==============================] - 22s 7ms/step - loss: 0.9839 - mse: 0.9836 - val_loss: 1.3436 - val_mse: 1.3431
evaluate
[1.3436073064804077, 1.3430677652359009]

Now, it makes some sense.现在，这有点道理。 The val_mse of the last epoch seems to match with evaluate result.最后一个时期的val_mse似乎与evaluate结果相匹配。 But, I was expecting mse and val_mse values to be the same in the progress bar since training data and validation data are the same.但是，我希望进度条中的mse和val_mse值相同，因为训练数据和验证数据是相同的。 I think my understanding of what the progress bar shows is not correct.我认为我对进度条显示的内容的理解不正确。 Can someone explain how I should interpret the progress bar and why mse and val_mse values on the progress bar are different?有人可以解释我应该如何解释进度条以及为什么进度条上的mse和val_mse值不同吗？

Answer 1

The reason why, for the same data, the metrics (loss, in your case) are different during training and validation steps is simple.对于相同的数据，在训练和验证步骤期间指标（在您的情况下为损失）不同的原因很简单。 Namely, during training your model trains by changing its parameters from batch to batch.即，在训练您的model期间，通过将其参数从批次更改为批次。 In the progress bar you see the mean of the metric for all batches.在进度条中，您会看到所有批次的指标平均值。 On the contrary, during validation step the parameters of your network are freezed.相反，在验证步骤期间，您的网络参数被冻结。 The used parameters are that obtained after processing the last batch the network has seen.使用的参数是在处理网络看到的最后一批之后获得的。 This explains the difference.这解释了差异。

The question why validation loss turned out to be bigger training loss is subtle.为什么验证损失被证明是更大的训练损失的问题是微妙的。 One reason might be that your model has layers that behave differently during training and validation (for example, BatchNorm, as noticed by Frightera).一个原因可能是您的 model 具有在训练和验证期间表现不同的层（例如，Frightera 注意到的 BatchNorm）。 The other reason might be improper learning rate.另一个原因可能是不正确的学习率。 If it is too big, the parameters will be changing too much, thus skipping the real minimum.如果它太大，参数将改变太多，从而跳过真正的最小值。 Even with adam optimization this might be the case.即使使用亚当优化，情况也可能如此。

To understand if the problem is with the learning rate, try making it much smaller.要了解问题是否与学习率有关，请尝试使其更小。 Is the difference in the metric persists, then your network have layers behaving differently during training and validation phases.指标的差异是否仍然存在，那么您的网络在训练和验证阶段的行为不同。

There might be other reasons for the difference in the metrics.指标差异可能还有其他原因。 For example, the training data is noisy, so that the network cannot train well.例如，训练数据有噪声，导致网络无法很好地训练。 This will cause the loss to fluctuate near the mean, which is normal.这将导致损失在均值附近波动，这是正常的。 To understand whether this is the case, you should study the plots for the loss for different batches (for example, using TensorBoard).要了解是否是这种情况，您应该研究不同批次的损失图（例如，使用 TensorBoard）。

Tensorflow 评估给出的误差大于训练的最后一个时期

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-06 08:29:45

Tensorflow 评估给出的误差大于训练的最后一个时期

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-06 08:29:45

解决方案1
1 已采纳 2021-04-06 08:29:45