训练损失不减少

Question

I am trying to implement autoencoders using CNN in tensorflow.我正在尝试在 tensorflow 中使用 CNN 实现自动编码器。 Firstly, I trained my model on MNIST dataset and everything worked perfectly, I got the lower loss and when I ran the inference model worked perfectly (giving good output images).首先，我在 MNIST 数据集上训练了我的模型，一切都运行良好，损失较低，当我运行推理模型时，推理模型运行良好（提供良好的输出图像）。 But then I decided to test my network on CelebA dataset, but my model fails and loss never decreases.但后来我决定在 CelebA 数据集上测试我的网络，但我的模型失败了并且损失从未减少。 The model processes fast and I tried decreasing the learning rate.模型处理速度很快，我尝试降低学习率。 Even though I decreased the learning rate, there is not much difference between the time it takes to train.即使我降低了学习率，训练所需的时间也没有太大区别。

Here I will try to put all the code that I use在这里我会尝试把我使用的所有代码

**Note I've set up GitHub repository as well, in case it's easier for you to read the code there **注意我也设置了 GitHub 存储库，以防您更容易阅读那里的代码

self.batch_size = 64
self.shape = shape

self.output_height = 64
self.output_width = 64
self.gf_dim = 64
self.c_dim = 3

self.strides_size = 2
self.kernel_size = 2
self.padding = 'SAME'
def encoder_conv_net(self, input_):

    self.conv1 = Model.batch_norm(self, Model.conv_2d(self, input_, [3,3,self.c_dim,32], name = 'conv1'))

    self.conv2 = Model.batch_norm(self, Model.conv_2d(self, self.conv1, [3,3,32,64], name = 'conv2'))

    self.conv3 = Model.batch_norm(self, Model.conv_2d(self, self.conv2, [3,3,64,128], name = 'conv3'))

    self.conv4 = Model.batch_norm(self, Model.conv_2d(self, self.conv3, [3,3,128,128], name = 'conv4'))

    fc = tf.reshape(self.conv4, [ -1, 512 ])

    dropout1 = tf.nn.dropout(fc, keep_prob=0.5)

    fc1 = Model.fully_connected(self, dropout1, 512)
    return tf.nn.tanh(fc1)

def decoder_conv_net(self, 
                     input_,
                     shape):

    g_width, g_height = shape[1], shape[0]
    g_width2, g_height2 = np.ceil(shape[1]/2), np.ceil(shape[0]/2)
    g_width4, g_height4 = np.ceil(shape[1]/4), np.ceil(shape[0]/4)
    g_width8, g_height8 = np.ceil(shape[1]/8), np.ceil(shape[0]/8)

    input_ = tf.reshape(input_, [-1, 4, 4, 128])

    print(input_.shape, g_width8, self.gf_dim)
    deconv1 = Model.deconv_2d(self, input_, [self.batch_size, g_width8, g_height8, self.gf_dim * 2],
                              [5,5],
                              name = 'deconv_1')

    deconv2 = Model.deconv_2d(self, deconv1, [self.batch_size, g_width4, g_height4, self.gf_dim * 2],
                              [5,5],
                              name = 'deconv_2')

    deconv3 = Model.deconv_2d(self, deconv2, [self.batch_size, g_width2, g_height2, self.gf_dim],
                              [5,5],
                              name = 'deconv_3')

    deconv4 = Model.deconv_2d(self, deconv3, [self.batch_size, g_width, g_height, self.c_dim],
                              [5,5],
                              name = 'deconv_4',
                              relu = False)

    return tf.nn.tanh(deconv4)

these are the functions for model encoder and decoder.这些是模型编码器和解码器的功能。

The main function looks like this主要功能看起来像这样

dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.shuffle(len(filenames))
dataset = dataset.map(parse_function, num_parallel_calls=4)
#dataset = dataset.map(train_preprocess, num_parallel_calls=4)
dataset = dataset.repeat().batch(batch_size)
#dataset = dataset.apply(tf.contrib.data.batch_and_drop_remainder(batch_size))
dataset = dataset.prefetch(1)

iterator = tf.data.Iterator.from_structure(dataset.output_types,
                                           dataset.output_shapes)

next_element = iterator.get_next()
init_op = iterator.make_initializer(dataset)

#print(next_element)
x = next_element
#plt.imshow(x)
#x = tf.reshape(x, [64, 64, 64, 3])

ENC = Encoder(shape)
DEC = Decoder(shape)

encoding = ENC.encoder_conv_net(x)

print("Encoding output shape " + str(encoding.shape))    

output = DEC.decoder_conv_net(encoding, [64,64])

print(output.shape)
loss = tf.reduce_mean(tf.squared_difference(x, output))

opt = tf.train.AdamOptimizer(learning_rate=0.1e-5)
train = opt.minimize(loss)
saver = tf.train.Saver()
init = tf.global_variables_initializer()

I call this train session in the normal way我以正常方式调用此火车会话

with tf.Session(graph=graph) as sess:
  #saver.restore(sess, '')

  sess.run(init) 
  sess.run(init_op)

  a = sess.run(next_element)

  for ind in tqdm(range(nb_epoch)):    
      loss_acc, outputs, _ = sess.run([loss, output, train])
      print(loss_acc)

      if ind % 40 == 0:
          print(loss_acc)
          saver.save(sess, save_path = "./checkpoints/" \
                       "/model_face.ckpt", global_step = ind)

After all of this training starts without an error, but my loss does not decrease.在所有这些训练都没有错误地开始之后，但我的损失并没有减少。

Here are utility functions as well这里还有效用函数

def parse_function(filename):
  image_string = tf.read_file(filename)
  image = tf.image.decode_jpeg(image_string, channels=3)
  image = tf.image.convert_image_dtype(image, tf.float32)
  image = tf.image.resize_images(image, [64, 64])
  return image

def train_preprocess(image):
  image = tf.image.random_flip_left_right(image)
  image = tf.image.random_brightness(image, max_delta=32.0 / 255.0)
  image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
  image = tf.clip_by_value(image, 0.0, 1.0)
  return image

Answer 1

By changing the activation function to softmax, which is more suitable to your image encoding:通过将激活函数改为softmax，更适合你的图像编码：

image = tf.clip_by_value(image, 0.0, 1.0)

The loss starts at 0.14066154 .损失从0.14066154开始。

Increasing the number of training epochs, the loss seems to get as low as ~ 0.08216808 , which is reasonable, given I've only trained the model for a couple of minutes on a single Titan Xp.增加训练时期的数量，损失似乎低至 ~ 0.08216808 ，这是合理的，因为我只在单个 Titan Xp 上训练了几分钟的模型。

Answer 2

can you print the value of x ,outputs and gradient?你能打印 x 的值，输出和梯度吗？ My first thought about unchanged loss is: 1.if x is always zero,then the output keep same.the loss keep same 2.if x is not zero,but keep same in every step, and if gradient is always zero(the weight don' update),then the output keep same, the loss keep same but because you can successfully run model on mnist, this show model is ok, so I suspect the problem is probably more about data.我对不变损失的第一个想法是：1.如果 x 始终为零，则输出保持不变。损失保持不变 2.如果 x 不为零，但在每一步都保持不变，如果梯度始终为零（权重不要更新），然后输出保持不变，损失保持不变但是因为您可以在 mnist 上成功运行模型，所以这个显示模型是可以的，所以我怀疑问题可能更多地与数据有关。

训练损失不减少

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-04-02 16:01:18

解决方案2
1 2019-03-31 02:30:57

训练损失不减少

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-04-02 16:01:18

解决方案2 1 2019-03-31 02:30:57

解决方案1
3 已采纳 2019-04-02 16:01:18

解决方案2
1 2019-03-31 02:30:57