当我用 tf.keras 替换 keras 时，model 坏了

Question

When I tried to use keras to build a simple autoencoder, I found something strange between keras and tf.keras.当我尝试使用 keras 构建一个简单的自动编码器时，我发现 keras 和 tf.keras 之间有些奇怪。

tf.__version__

2.2.0 2.2.0

(x_train,_), (x_test,_) = tf.keras.datasets.mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.  
x_train = x_train.reshape((len(x_train), 784))  
x_test = x_test.reshape((len(x_test), 784))  # None, 784

The original picture原图

plt.imshow(x_train[0].reshape(28, 28), cmap='gray')

enter image description here在此处输入图像描述

import keras
# import tensorflow.keras as keras

my_autoencoder = keras.models.Sequential([
      keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=10, shuffle=True, validation_data=(x_test, x_test))

training训练

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 7s 112us/step - loss: 0.2233 - val_loss: 0.1670
Epoch 2/10
60000/60000 [==============================] - 7s 111us/step - loss: 0.1498 - val_loss: 0.1337
Epoch 3/10
60000/60000 [==============================] - 7s 110us/step - loss: 0.1254 - val_loss: 0.1152
Epoch 4/10
60000/60000 [==============================] - 7s 110us/step - loss: 0.1103 - val_loss: 0.1032
Epoch 5/10
60000/60000 [==============================] - 7s 110us/step - loss: 0.1010 - val_loss: 0.0963
Epoch 6/10
60000/60000 [==============================] - 7s 109us/step - loss: 0.0954 - val_loss: 0.0919
Epoch 7/10
60000/60000 [==============================] - 7s 109us/step - loss: 0.0917 - val_loss: 0.0889
Epoch 8/10
60000/60000 [==============================] - 7s 110us/step - loss: 0.0890 - val_loss: 0.0866
Epoch 9/10
60000/60000 [==============================] - 7s 110us/step - loss: 0.0870 - val_loss: 0.0850
Epoch 10/10
60000/60000 [==============================] - 7s 109us/step - loss: 0.0853 - val_loss: 0.0835

the decoded image with keras keras 的解码图像

temp = my_autoencoder.predict(x_train)

plt.imshow(temp[0].reshape(28, 28), cmap='gray')

enter image description here在此处输入图像描述

So far, everything is as normal as expected, but something is weird when I replaced keras with tf.keras到目前为止，一切都和预期的一样正常，但是当我用 tf.keras 替换 keras 时有些奇怪

# import keras
import tensorflow.keras as keras
my_autoencoder = keras.models.Sequential([
      keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=10, shuffle=True, validation_data=(x_test, x_test))

training训练

Epoch 1/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6952 - val_loss: 0.6940
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6929 - val_loss: 0.6918
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6907 - val_loss: 0.6896
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6885 - val_loss: 0.6873
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6862 - val_loss: 0.6848
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6835 - val_loss: 0.6818
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6802 - val_loss: 0.6782
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6763 - val_loss: 0.6737
Epoch 9/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6714 - val_loss: 0.6682
Epoch 10/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6652 - val_loss: 0.6612

the decoded image with tf.keras tf.keras 的解码图像

temp = my_autoencoder.predict(x_train)

plt.imshow(temp[0].reshape(28, 28), cmap='gray')

enter image description here I can't find anything wrong, does anyone know why?在此处输入图像描述我找不到任何错误，有人知道为什么吗？

Answer 1

The true culprit is the default learning rate used by keras.Adadelta vs tf.keras.Adadelta : 1 vs 1e-4 - see below.真正的罪魁祸首是keras.Adadelta vs tf.keras.Adadelta使用的默认学习率： 1 vs 1e-4 - 见下文。 It's true that keras and tf.keras implementations differ a bit, but difference in results can't be as dramatic as you observed (only in a different configuration, eg learning rate).确实， keras和tf.keras实现有所不同，但结果的差异不会像您观察到的那么显着（仅在不同的配置中，例如学习率）。

You can confirm this in your original code by running print(model.optimizer.get_config()) .您可以通过运行print(model.optimizer.get_config())在原始代码中确认这一点。

import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow.keras as keras

(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test  = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), 784))
x_test  = x_test.reshape((len(x_test), 784))  # None, 784

###############################################################################
model = keras.models.Sequential([
    keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
    keras.layers.Dense(784, activation='sigmoid')
])
model.compile(optimizer=keras.optimizers.Adadelta(learning_rate=1),
              loss='binary_crossentropy')

model.fit(x_train, x_train, epochs=10, shuffle=True,
          validation_data=(x_test, x_test))

###############################################################################
temp = model.predict(x_train)
plt.imshow(temp[0].reshape(28, 28), cmap='gray')

Epoch 1/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2229 - val_loss: 0.1668
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1497 - val_loss: 0.1337
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1253 - val_loss: 0.1152
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1103 - val_loss: 0.1033
Epoch 5/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1009 - val_loss: 0.0962
Epoch 6/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0952 - val_loss: 0.0916
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0914 - val_loss: 0.0885
Epoch 8/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0886 - val_loss: 0.0862
Epoch 9/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0865 - val_loss: 0.0844
Epoch 10/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0849 - val_loss: 0.0830

Answer 2

If you use adam , the tf.keras model performs better.如果您使用adam ，则tf.keras model 性能更好。 ( keras and tf.keras uses two different version of the optimizers) （ keras和tf.keras使用两个不同版本的优化器）

Most probably, it has to do with momentum for convergence for this data.最有可能的是，它与该数据的收敛momentum有关。 It is very slow, maybe you'll need to train for more epochs with higher learning rate.它非常慢，也许你需要以更高的学习率训练更多的时期。

Here's an answer why adadelta should be avoided: How to set parameters of the Adadelta Algorithm in Tensorflow correctly?这是为什么应避免使用 adadelta 的答案：如何正确设置 Tensorflow 中的 Adadelta 算法的参数？

import tensorflow as tf

(x_train,_), (x_test,_) = tf.keras.datasets.mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.  
x_train = x_train.reshape((len(x_train), 784))  
x_test = x_test.reshape((len(x_test), 784))  # None, 784

# import keras
import tensorflow.keras as keras
my_autoencoder = keras.models.Sequential([
      keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=10, shuffle=True, validation_data=(x_test, x_test))

Epoch 1/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1372 - val_loss: 0.0909
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0840 - val_loss: 0.0782
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0773 - val_loss: 0.0753
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0754 - val_loss: 0.0742
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0747 - val_loss: 0.0738
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0744 - val_loss: 0.0735
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0741 - val_loss: 0.0734
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0740 - val_loss: 0.0733
Epoch 9/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0738 - val_loss: 0.0731
Epoch 10/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0737 - val_loss: 0.0734

<tensorflow.python.keras.callbacks.History at 0x7f8c83d907b8>

NB: keras and tf.keras have slightly different implementations for Model , so internally they call different functions, the performance can vary, it's no surprise.注意： keras和tf.keras对Model的实现略有不同，因此在内部它们调用不同的函数，性能可能会有所不同，这不足为奇。

In fact, the problem is with the optimizer, not the model, to validate that, you can try training a keras model with tf AdaDelta, it will also show poor results.实际上，问题出在优化器上，而不是 model 上，为了验证这一点，您可以尝试用tf AdaDelta 训练keras model，结果也会很差。

import keras
# import tensorflow.keras as keras

my_autoencoder = keras.models.Sequential([
      keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile(tf.keras.optimizers.Adadelta(), loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=10, shuffle=True, validation_data=(x_test, x_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 6s 101us/step - loss: 0.6955 - val_loss: 0.6946
Epoch 2/10
60000/60000 [==============================] - 6s 99us/step - loss: 0.6936 - val_loss: 0.6927
Epoch 3/10
60000/60000 [==============================] - 6s 100us/step - loss: 0.6919 - val_loss: 0.6910
Epoch 4/10
60000/60000 [==============================] - 6s 96us/step - loss: 0.6901 - val_loss: 0.6892
Epoch 5/10
60000/60000 [==============================] - 6s 94us/step - loss: 0.6883 - val_loss: 0.6873
Epoch 6/10
60000/60000 [==============================] - 6s 95us/step - loss: 0.6863 - val_loss: 0.6851
Epoch 7/10
60000/60000 [==============================] - 6s 101us/step - loss: 0.6839 - val_loss: 0.6825
Epoch 8/10
60000/60000 [==============================] - 6s 101us/step - loss: 0.6812 - val_loss: 0.6794
Epoch 9/10
60000/60000 [==============================] - 6s 99us/step - loss: 0.6778 - val_loss: 0.6756
Epoch 10/10
60000/60000 [==============================] - 6s 101us/step - loss: 0.6736 - val_loss: 0.6710

<keras.callbacks.callbacks.History at 0x7f8c805bbe10>

keras and tf.keras calls two different optimizers when the optimizer parameter is passed as a string.当优化器参数作为字符串传递时， keras和tf.keras调用两个不同的优化器。

import tensorflow as tf
# import tensorflow.keras as keras

my_autoencoder = tf.keras.models.Sequential([
      tf.keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      tf.keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile('adadelta', loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=1, shuffle=True, validation_data=(x_test, x_test))
my_autoencoder.optimizer

<tensorflow.python.keras.optimizer_v2.adadelta.Adadelta at 0x7f8c7fc3ce80>

import keras
# import tensorflow.keras as keras

my_autoencoder = keras.models.Sequential([
      keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile('adadelta', loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=1, shuffle=True, validation_data=(x_test, x_test))
my_autoencoder.optimizer

<keras.optimizers.Adadelta at 0x7f8c7fc3c908>

So, the confusion can be avoided by importing the optimizer separately.因此，可以通过单独导入优化器来避免混淆。

当我用 tf.keras 替换 keras 时，model 坏了

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-05-27 13:55:14

解决方案2
2 2020-05-27 13:22:30

当我用 tf.keras 替换 keras 时，model 坏了

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-05-27 13:55:14

解决方案2 2 2020-05-27 13:22:30

解决方案1
3 已采纳 2020-05-27 13:55:14

解决方案2
2 2020-05-27 13:22:30