The model is broken when I replaced keras with tf.keras

Question

When I tried to use keras to build a simple autoencoder, I found something strange between keras and tf.keras.

tf.__version__

2.2.0

(x_train,_), (x_test,_) = tf.keras.datasets.mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.  
x_train = x_train.reshape((len(x_train), 784))  
x_test = x_test.reshape((len(x_test), 784))  # None, 784

The original picture

plt.imshow(x_train[0].reshape(28, 28), cmap='gray')

enter image description here

import keras
# import tensorflow.keras as keras

my_autoencoder = keras.models.Sequential([
      keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=10, shuffle=True, validation_data=(x_test, x_test))

training

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 7s 112us/step - loss: 0.2233 - val_loss: 0.1670
Epoch 2/10
60000/60000 [==============================] - 7s 111us/step - loss: 0.1498 - val_loss: 0.1337
Epoch 3/10
60000/60000 [==============================] - 7s 110us/step - loss: 0.1254 - val_loss: 0.1152
Epoch 4/10
60000/60000 [==============================] - 7s 110us/step - loss: 0.1103 - val_loss: 0.1032
Epoch 5/10
60000/60000 [==============================] - 7s 110us/step - loss: 0.1010 - val_loss: 0.0963
Epoch 6/10
60000/60000 [==============================] - 7s 109us/step - loss: 0.0954 - val_loss: 0.0919
Epoch 7/10
60000/60000 [==============================] - 7s 109us/step - loss: 0.0917 - val_loss: 0.0889
Epoch 8/10
60000/60000 [==============================] - 7s 110us/step - loss: 0.0890 - val_loss: 0.0866
Epoch 9/10
60000/60000 [==============================] - 7s 110us/step - loss: 0.0870 - val_loss: 0.0850
Epoch 10/10
60000/60000 [==============================] - 7s 109us/step - loss: 0.0853 - val_loss: 0.0835

the decoded image with keras

temp = my_autoencoder.predict(x_train)

plt.imshow(temp[0].reshape(28, 28), cmap='gray')

enter image description here

So far, everything is as normal as expected, but something is weird when I replaced keras with tf.keras

# import keras
import tensorflow.keras as keras
my_autoencoder = keras.models.Sequential([
      keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=10, shuffle=True, validation_data=(x_test, x_test))

training

Epoch 1/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6952 - val_loss: 0.6940
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6929 - val_loss: 0.6918
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6907 - val_loss: 0.6896
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6885 - val_loss: 0.6873
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6862 - val_loss: 0.6848
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6835 - val_loss: 0.6818
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6802 - val_loss: 0.6782
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6763 - val_loss: 0.6737
Epoch 9/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6714 - val_loss: 0.6682
Epoch 10/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6652 - val_loss: 0.6612

the decoded image with tf.keras

temp = my_autoencoder.predict(x_train)

plt.imshow(temp[0].reshape(28, 28), cmap='gray')

enter image description here I can't find anything wrong, does anyone know why?

Answer 1

The true culprit is the default learning rate used by keras.Adadelta vs tf.keras.Adadelta : 1 vs 1e-4 - see below. It's true that keras and tf.keras implementations differ a bit, but difference in results can't be as dramatic as you observed (only in a different configuration, eg learning rate).

You can confirm this in your original code by running print(model.optimizer.get_config()) .

import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow.keras as keras

(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test  = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), 784))
x_test  = x_test.reshape((len(x_test), 784))  # None, 784

###############################################################################
model = keras.models.Sequential([
    keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
    keras.layers.Dense(784, activation='sigmoid')
])
model.compile(optimizer=keras.optimizers.Adadelta(learning_rate=1),
              loss='binary_crossentropy')

model.fit(x_train, x_train, epochs=10, shuffle=True,
          validation_data=(x_test, x_test))

###############################################################################
temp = model.predict(x_train)
plt.imshow(temp[0].reshape(28, 28), cmap='gray')

Epoch 1/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2229 - val_loss: 0.1668
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1497 - val_loss: 0.1337
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1253 - val_loss: 0.1152
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1103 - val_loss: 0.1033
Epoch 5/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1009 - val_loss: 0.0962
Epoch 6/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0952 - val_loss: 0.0916
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0914 - val_loss: 0.0885
Epoch 8/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0886 - val_loss: 0.0862
Epoch 9/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0865 - val_loss: 0.0844
Epoch 10/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0849 - val_loss: 0.0830

Answer 2

If you use adam , the tf.keras model performs better. ( keras and tf.keras uses two different version of the optimizers)

Most probably, it has to do with momentum for convergence for this data. It is very slow, maybe you'll need to train for more epochs with higher learning rate.

Here's an answer why adadelta should be avoided: How to set parameters of the Adadelta Algorithm in Tensorflow correctly?

import tensorflow as tf

(x_train,_), (x_test,_) = tf.keras.datasets.mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.  
x_train = x_train.reshape((len(x_train), 784))  
x_test = x_test.reshape((len(x_test), 784))  # None, 784

# import keras
import tensorflow.keras as keras
my_autoencoder = keras.models.Sequential([
      keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=10, shuffle=True, validation_data=(x_test, x_test))

Epoch 1/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1372 - val_loss: 0.0909
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0840 - val_loss: 0.0782
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0773 - val_loss: 0.0753
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0754 - val_loss: 0.0742
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0747 - val_loss: 0.0738
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0744 - val_loss: 0.0735
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0741 - val_loss: 0.0734
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0740 - val_loss: 0.0733
Epoch 9/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0738 - val_loss: 0.0731
Epoch 10/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0737 - val_loss: 0.0734

<tensorflow.python.keras.callbacks.History at 0x7f8c83d907b8>

NB: keras and tf.keras have slightly different implementations for Model , so internally they call different functions, the performance can vary, it's no surprise.

In fact, the problem is with the optimizer, not the model, to validate that, you can try training a keras model with tf AdaDelta, it will also show poor results.

import keras
# import tensorflow.keras as keras

my_autoencoder = keras.models.Sequential([
      keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile(tf.keras.optimizers.Adadelta(), loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=10, shuffle=True, validation_data=(x_test, x_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 6s 101us/step - loss: 0.6955 - val_loss: 0.6946
Epoch 2/10
60000/60000 [==============================] - 6s 99us/step - loss: 0.6936 - val_loss: 0.6927
Epoch 3/10
60000/60000 [==============================] - 6s 100us/step - loss: 0.6919 - val_loss: 0.6910
Epoch 4/10
60000/60000 [==============================] - 6s 96us/step - loss: 0.6901 - val_loss: 0.6892
Epoch 5/10
60000/60000 [==============================] - 6s 94us/step - loss: 0.6883 - val_loss: 0.6873
Epoch 6/10
60000/60000 [==============================] - 6s 95us/step - loss: 0.6863 - val_loss: 0.6851
Epoch 7/10
60000/60000 [==============================] - 6s 101us/step - loss: 0.6839 - val_loss: 0.6825
Epoch 8/10
60000/60000 [==============================] - 6s 101us/step - loss: 0.6812 - val_loss: 0.6794
Epoch 9/10
60000/60000 [==============================] - 6s 99us/step - loss: 0.6778 - val_loss: 0.6756
Epoch 10/10
60000/60000 [==============================] - 6s 101us/step - loss: 0.6736 - val_loss: 0.6710

<keras.callbacks.callbacks.History at 0x7f8c805bbe10>

keras and tf.keras calls two different optimizers when the optimizer parameter is passed as a string.

import tensorflow as tf
# import tensorflow.keras as keras

my_autoencoder = tf.keras.models.Sequential([
      tf.keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      tf.keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile('adadelta', loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=1, shuffle=True, validation_data=(x_test, x_test))
my_autoencoder.optimizer

<tensorflow.python.keras.optimizer_v2.adadelta.Adadelta at 0x7f8c7fc3ce80>

import keras
# import tensorflow.keras as keras

my_autoencoder = keras.models.Sequential([
      keras.layers.Dense(64, input_shape=(784, ), activation='relu'),
      keras.layers.Dense(784, activation='sigmoid')                                             
])
my_autoencoder.compile('adadelta', loss='binary_crossentropy')

my_autoencoder.fit(x_train, x_train, epochs=1, shuffle=True, validation_data=(x_test, x_test))
my_autoencoder.optimizer

<keras.optimizers.Adadelta at 0x7f8c7fc3c908>

So, the confusion can be avoided by importing the optimizer separately.

The model is broken when I replaced keras with tf.keras

Question

2 answers

solution1
3 ACCPTED 2020-05-27 13:55:14

solution2
2 2020-05-27 13:22:30

The model is broken when I replaced keras with tf.keras

Question

2 answers

solution1 3 ACCPTED 2020-05-27 13:55:14

solution2 2 2020-05-27 13:22:30

solution1
3 ACCPTED 2020-05-27 13:55:14

solution2
2 2020-05-27 13:22:30