为什么我的自动编码器的损失在训练期间根本没有下降？

Question

I am following this tutorial to create a Keras-based autoencoder, but using my own data.我正在按照本教程创建一个基于 Keras 的自动编码器，但使用我自己的数据。 That dataset includes about 20k training and about 4k validation images.该数据集包括大约 20k 个训练图像和大约 4k 个验证图像。 All of them are very similar, all show the very same object.它们都非常相似，都显示相同的 object。 I haven't modified the Keras model layout from the tutorial, only changed the input size, since I used 300x300 images.我没有修改教程中的 Keras model 布局，只更改了输入大小，因为我使用了 300x300 图像。 So my model looks like this:所以我的 model 看起来像这样：

Model: "autoencoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, 300, 300, 1)]     0
_________________________________________________________________
encoder (Functional)         (None, 16)                5779216
_________________________________________________________________
decoder (Functional)         (None, 300, 300, 1)       6176065
=================================================================
Total params: 11,955,281
Trainable params: 11,954,897
Non-trainable params: 384
_________________________________________________________________
Model: "encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, 300, 300, 1)]     0
_________________________________________________________________
conv2d (Conv2D)              (None, 150, 150, 32)      320
_________________________________________________________________
leaky_re_lu (LeakyReLU)      (None, 150, 150, 32)      0
_________________________________________________________________
batch_normalization (BatchNo (None, 150, 150, 32)      128
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 75, 75, 64)        18496
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 75, 75, 64)        0
_________________________________________________________________
batch_normalization_1 (Batch (None, 75, 75, 64)        256
_________________________________________________________________
flatten (Flatten)            (None, 360000)            0
_________________________________________________________________
dense (Dense)                (None, 16)                5760016
=================================================================
Total params: 5,779,216
Trainable params: 5,779,024
Non-trainable params: 192
_________________________________________________________________
Model: "decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_2 (InputLayer)         [(None, 16)]              0
_________________________________________________________________
dense_1 (Dense)              (None, 360000)            6120000
_________________________________________________________________
reshape (Reshape)            (None, 75, 75, 64)        0
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 150, 150, 64)      36928
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 150, 150, 64)      0
_________________________________________________________________
batch_normalization_2 (Batch (None, 150, 150, 64)      256
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 300, 300, 32)      18464
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 300, 300, 32)      0
_________________________________________________________________
batch_normalization_3 (Batch (None, 300, 300, 32)      128
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 300, 300, 1)       289
_________________________________________________________________
activation (Activation)      (None, 300, 300, 1)       0
=================================================================
Total params: 6,176,065
Trainable params: 6,175,873
Non-trainable params: 192

Then I initialize my model like this:然后我像这样初始化我的 model：

IMGSIZE = 300
EPOCHS = 20
LR = 0.0001

(encoder, decoder, autoencoder) = ConvAutoencoder.build(IMGSIZE, IMGSIZE, 1)
sched = ExponentialDecay(initial_learning_rate=LR, decay_steps=EPOCHS, decay_rate=LR / EPOCHS)
autoencoder.compile(loss="mean_squared_error", optimizer=Adam(learning_rate=sched))

Then I train my model like this:然后我像这样训练我的 model：

image_generator = ImageDataGenerator(rescale=1.0 / 255)
train_gen = image_generator.flow_from_directory(
    os.path.join(args.images, "training"),
    class_mode="input",
    color_mode="grayscale",
    target_size=(IMGSIZE, IMGSIZE),
    batch_size=BS,
)
val_gen = image_generator.flow_from_directory(
    os.path.join(args.images, "validation"),
    class_mode="input",
    color_mode="grayscale",
    target_size=(IMGSIZE, IMGSIZE),
    batch_size=BS,
)
hist = autoencoder.fit(train_gen, validation_data=val_gen, epochs=EPOCHS, batch_size=BS)

My batch size BS is 32 and I start with an initial Adam learning rate of 0.001 (but I also tried values like 0.1 down to 0.0001).我的批量BS是 32，我从初始 Adam 学习率 0.001 开始（但我也尝试了从 0.1 到 0.0001 的值）。 I also tried to increase the latent dimensionality to something like 1024, but that doesn't solve my issue either.我还尝试将潜在维度增加到 1024 之类的东西，但这也不能解决我的问题。

Now during training the loss goes down in the first epoch from about 0.5 to about 0.2 - and then beginning from the second epoch that loss sticks at the very same value, eg 0.1989, and then it stays there "forever", regardless of how many epochs I train and/or the initial learning rate I use.现在在训练期间，损失在第一个时期从大约 0.5 下降到大约 0.2 - 然后从第二个时期开始，损失保持在相同的值，例如 0.1989，然后它“永远”保持在那里，不管有多少我训练的时期和/或我使用的初始学习率。

Any ideas what could be the problem here?有什么想法可能是这里的问题吗？

Answer 1

It could be that the decay_rate argument in tf.keras.optimizers.schedules.ExponentialDecay is decaying your learning rate quicker than you think it is, effectively making your learning rate zero.可能是tf.keras.optimizers.schedules.ExponentialDecay中的decay_rate参数比您想象的更快地衰减您的学习率，从而有效地使您的学习率为零。

为什么我的自动编码器的损失在训练期间根本没有下降？

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-04-05 15:32:44

为什么我的自动编码器的损失在训练期间根本没有下降？

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-04-05 15:32:44

解决方案1
0 已采纳 2021-04-05 15:32:44