在 tensorflow/keras 中，为什么 train_on_batch output 损失在使用 predict_on_batch 训练后重新计算时不同？

Question

我使用 keras 创建了一个 model 并使用train_on_batch对其进行了训练。 为了检查 model 是否完成了它应该做的事情，我使用predict_on_batch方法重新计算了训练阶段前后的损失。 但是，正如您在阅读标题时猜到的那样，我没有相同的 output 损失。

下面是一个基本代码来说明我的问题：

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import tensorflow as tf
import numpy as np 

# Loss definition
def mse(y_true, y_pred):
    return tf.reduce_mean(tf.square(y_true-y_pred))

# Model definition
model = Sequential()
model.add(Dense(1))
model.compile('rmsprop',mse)

# Data creation
batch_size = 10
x = np.random.random_sample([batch_size,10])
y = np.random.random_sample(batch_size)

# Print loss before training
y_pred = model.predict_on_batch(x)
print("Before: " + str(mse(y,y_pred).numpy()))

# Print loss output from train_on_batch
print("Train output: " + str(model.train_on_batch(x,y)))

# Print loss after training
y_pred = model.predict_on_batch(x)
print("After: " + str(mse(y,y_pred).numpy()))

使用此代码，我得到以下输出：

Before: 0.28556848
Train output: 0.29771945
After: 0.27345362

我认为训练损失和训练后计算的损失应该是相同的。 所以我想明白为什么不呢？

Answer 1

这就是train_on_batch的工作原理，它计算损失，然后更新网络，所以我们在网络更新之前得到损失。 当我们应用predict_on_batch时，我们会从更新后的网络中获得预测。

在后台，train_on_batch 做了更多的事情，比如修复数据类型、标准化数据等。

test_on_batch train_on_batch 如果你运行test_on_batch你会发现结果接近train_on_bacth但不一样。

这是test_on_batch的实现： https://github.com/tensorflow/tensorflow/blob/e5bf8de410005de06a7ff5393fafdf832ef1d4ad/tensorflow/python/keras/engine/training_v2_utils.py#L442

它在内部调用_standardize_user_data来修复您的数据类型、数据形状等。

一次，您使用适当的形状和数据类型修复x和y ，结果非常接近，除了由于数值不稳定性导致的一些小的差异delta 。

这是一个最小示例，其中test_on_batch 、 train_on_batch和predict_on_batch似乎在数值上同意结果。

from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
import numpy as np

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import tensorflow as tf
import numpy as np 

# Loss definition
def mse(y_true, y_pred):
    return tf.reduce_mean(tf.square(y_true-y_pred))

# Model definition
model = Sequential()
model.add(Dense(1, input_shape = (10,)))
model.compile(optimizer = 'adam', loss = mse, metrics = [mse])

# Data creation
batch_size = 10
x = np.random.random_sample([batch_size,10]).astype('float32').reshape(-1, 10)
y = np.random.random_sample(batch_size).astype('float32').reshape(-1,1)

print(x.shape)
print(y.shape)

model.summary()

# running 5 iterations to check
for _ in range(5):

  # Print loss before training
  y_pred = model.predict_on_batch(x)
  print("Before: " + str(mse(y,y_pred).numpy()))

  # Print loss output from train_on_batch
  print("Train output: " + str(model.train_on_batch(x,y)))

  print(model.test_on_batch(x, y))

  # Print loss after training
  y_pred = model.predict_on_batch(x)
  print("After: " + str(mse(y,y_pred).numpy()))

(10, 10)
(10, 1)
Model: "sequential_25"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_27 (Dense)             (None, 1)                 11        
=================================================================
Total params: 11
Trainable params: 11
Non-trainable params: 0
_________________________________________________________________
Before: 0.30760005
Train output: [0.3076000511646271, 0.3076000511646271]
[0.3052913546562195, 0.3052913546562195]
After: 0.30529135
Before: 0.30529135
Train output: [0.3052913546562195, 0.3052913546562195]
[0.30304449796676636, 0.30304449796676636]
After: 0.3030445
Before: 0.3030445
Train output: [0.30304449796676636, 0.30304449796676636]
[0.3008604645729065, 0.3008604645729065]
After: 0.30086046
Before: 0.30086046
Train output: [0.3008604645729065, 0.3008604645729065]
[0.2987399995326996, 0.2987399995326996]
After: 0.29874
Before: 0.29874
Train output: [0.2987399995326996, 0.2987399995326996]
[0.2966836094856262, 0.2966836094856262]
After: 0.2966836

注意： train_on_batch在计算损失后更新神经网络的权重，所以很明显train_on_batch和test_on_batch或predict_on_batch的损失不会完全相同。 正确的问题可能是为什么test_on_batch和predict_on_batch会给您的数据带来不同的损失。

Answer 2

感谢 Zabir Al Nazi，我理解了这个问题：keras predict_on_batch output 与test_on_batch和train_on_batch不同，因为对数据进行了额外的标准化。

如果必须使用predict_on_batch ，则必须在之前对数据进行标准化（至少在 tersorflow 2.1.0 中，但以后可能会对其进行编辑）。 它可以手动完成，也可以使用_standardize_user_data function 完成。

下面是我以前使用此 function 更正的代码：

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import tensorflow as tf
import numpy as np 

# Loss definition
def mse(y_true, y_pred):
    return tf.reduce_mean(tf.square(y_true-y_pred))

# Model definition
model = Sequential()
model.add(Dense(1))
model.compile('rmsprop',mse)

# Data creation
batch_size = 10
x = np.random.random_sample([batch_size,10])
y = np.random.random_sample(batch_size)

# STANDARDIZATION
tmp_x,tmp_y,_ = model._standardize_user_data(x,y)
x,y = tmp_x[0], tmp_y[0]

# Print loss before training
y_pred = model.predict_on_batch(x)
print("Before: " + str(mse(y,y_pred).numpy()))

# Print loss output from train_on_batch
print("Train output: " + str(model.train_on_batch(x,y)))

# Print loss after training
y_pred = model.predict_on_batch(x)
print("After: " + str(mse(y,y_pred).numpy()))

这将给出适当的 output：

Before: 0.425879
Train output: 0.425879
After: 0.4123691

在 tensorflow/keras 中，为什么 train_on_batch output 损失在使用 predict_on_batch 训练后重新计算时不同？

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-04-26 12:17:03

解决方案2
0 2020-04-26 15:27:02

在 tensorflow/keras 中，为什么 train_on_batch output 损失在使用 predict_on_batch 训练后重新计算时不同？

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-04-26 12:17:03

解决方案2 0 2020-04-26 15:27:02

解决方案1
3 已采纳 2020-04-26 12:17:03

解决方案2
0 2020-04-26 15:27:02