繁体   English   中英

在 tensorflow/keras 中,为什么 train_on_batch output 损失在使用 predict_on_batch 训练后重新计算时不同?

[英]In tensorflow/keras, why train_on_batch output loss is different when recomputed after training using predict_on_batch?

我使用 keras 创建了一个 model 并使用train_on_batch对其进行了训练。 为了检查 model 是否完成了它应该做的事情,我使用predict_on_batch方法重新计算了训练阶段前后的损失。 但是,正如您在阅读标题时猜到的那样,我没有相同的 output 损失。

下面是一个基本代码来说明我的问题:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import tensorflow as tf
import numpy as np 

# Loss definition
def mse(y_true, y_pred):
    return tf.reduce_mean(tf.square(y_true-y_pred))

# Model definition
model = Sequential()
model.add(Dense(1))
model.compile('rmsprop',mse)

# Data creation
batch_size = 10
x = np.random.random_sample([batch_size,10])
y = np.random.random_sample(batch_size)

# Print loss before training
y_pred = model.predict_on_batch(x)
print("Before: " + str(mse(y,y_pred).numpy()))

# Print loss output from train_on_batch
print("Train output: " + str(model.train_on_batch(x,y)))

# Print loss after training
y_pred = model.predict_on_batch(x)
print("After: " + str(mse(y,y_pred).numpy()))

使用此代码,我得到以下输出:

Before: 0.28556848
Train output: 0.29771945
After: 0.27345362

我认为训练损失和训练后计算的损失应该是相同的。 所以我想明白为什么不呢?

这就是train_on_batch的工作原理,它计算损失,然后更新网络,所以我们在网络更新之前得到损失。 当我们应用predict_on_batch时,我们会从更新后的网络中获得预测。

在后台,train_on_batch 做了更多的事情,比如修复数据类型、标准化数据等。

test_on_batch train_on_batch 如果你运行test_on_batch你会发现结果接近train_on_bacth但不一样。

这是test_on_batch的实现: https://github.com/tensorflow/tensorflow/blob/e5bf8de410005de06a7ff5393fafdf832ef1d4ad/tensorflow/python/keras/engine/training_v2_utils.py#L442

它在内部调用_standardize_user_data来修复您的数据类型、数据形状等。

一次,您使用适当的形状和数据类型修复xy ,结果非常接近,除了由于数值不稳定性导致的一些小的差异delta

这是一个最小示例,其中test_on_batchtrain_on_batchpredict_on_batch似乎在数值上同意结果。

from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
import numpy as np

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import tensorflow as tf
import numpy as np 

# Loss definition
def mse(y_true, y_pred):
    return tf.reduce_mean(tf.square(y_true-y_pred))

# Model definition
model = Sequential()
model.add(Dense(1, input_shape = (10,)))
model.compile(optimizer = 'adam', loss = mse, metrics = [mse])

# Data creation
batch_size = 10
x = np.random.random_sample([batch_size,10]).astype('float32').reshape(-1, 10)
y = np.random.random_sample(batch_size).astype('float32').reshape(-1,1)

print(x.shape)
print(y.shape)

model.summary()

# running 5 iterations to check
for _ in range(5):

  # Print loss before training
  y_pred = model.predict_on_batch(x)
  print("Before: " + str(mse(y,y_pred).numpy()))

  # Print loss output from train_on_batch
  print("Train output: " + str(model.train_on_batch(x,y)))

  print(model.test_on_batch(x, y))

  # Print loss after training
  y_pred = model.predict_on_batch(x)
  print("After: " + str(mse(y,y_pred).numpy()))
(10, 10)
(10, 1)
Model: "sequential_25"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_27 (Dense)             (None, 1)                 11        
=================================================================
Total params: 11
Trainable params: 11
Non-trainable params: 0
_________________________________________________________________
Before: 0.30760005
Train output: [0.3076000511646271, 0.3076000511646271]
[0.3052913546562195, 0.3052913546562195]
After: 0.30529135
Before: 0.30529135
Train output: [0.3052913546562195, 0.3052913546562195]
[0.30304449796676636, 0.30304449796676636]
After: 0.3030445
Before: 0.3030445
Train output: [0.30304449796676636, 0.30304449796676636]
[0.3008604645729065, 0.3008604645729065]
After: 0.30086046
Before: 0.30086046
Train output: [0.3008604645729065, 0.3008604645729065]
[0.2987399995326996, 0.2987399995326996]
After: 0.29874
Before: 0.29874
Train output: [0.2987399995326996, 0.2987399995326996]
[0.2966836094856262, 0.2966836094856262]
After: 0.2966836

注意: train_on_batch在计算损失后更新神经网络的权重,所以很明显train_on_batchtest_on_batchpredict_on_batch的损失不会完全相同。 正确的问题可能是为什么test_on_batchpredict_on_batch会给您的数据带来不同的损失。

感谢 Zabir Al Nazi,我理解了这个问题:keras predict_on_batch output 与test_on_batchtrain_on_batch不同,因为对数据进行了额外的标准化。

如果必须使用predict_on_batch ,则必须在之前对数据进行标准化(至少在 tersorflow 2.1.0 中,但以后可能会对其进行编辑)。 它可以手动完成,也可以使用_standardize_user_data function 完成。

下面是我以前使用此 function 更正的代码:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import tensorflow as tf
import numpy as np 

# Loss definition
def mse(y_true, y_pred):
    return tf.reduce_mean(tf.square(y_true-y_pred))

# Model definition
model = Sequential()
model.add(Dense(1))
model.compile('rmsprop',mse)

# Data creation
batch_size = 10
x = np.random.random_sample([batch_size,10])
y = np.random.random_sample(batch_size)

# STANDARDIZATION
tmp_x,tmp_y,_ = model._standardize_user_data(x,y)
x,y = tmp_x[0], tmp_y[0]

# Print loss before training
y_pred = model.predict_on_batch(x)
print("Before: " + str(mse(y,y_pred).numpy()))

# Print loss output from train_on_batch
print("Train output: " + str(model.train_on_batch(x,y)))

# Print loss after training
y_pred = model.predict_on_batch(x)
print("After: " + str(mse(y,y_pred).numpy()))

这将给出适当的 output:

Before: 0.425879
Train output: 0.425879
After: 0.4123691

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM