当我在训练时间设置 is_training=False 时，为什么 Tensorflow BN 层中的moving_mean和moving _variance会变成nan？

Question

In the training time, I want to keep BN layer unchange, so I pass is_training=False to：在训练的时候，我想保持BN层不变，所以我通过is_training=False到：

tf.contrib.layers.batch_norm(tensor_go_next, decay=0.9, center=True, scale=True, epsilon=1e-9,
                                              updates_collections=tf.GraphKeys.UPDATE_OPS,
                                              is_training=False, scope=name_bn_scope)

and didn't put name_bn_scope/gamma:0 name_bn_scope/beta:0 to train var_list.并且没有将name_bn_scope/gamma:0 name_bn_scope/beta:0用于训练 var_list。

After training, gamma and beta are still the same, which is what I want exactly.训练后，gamma 和 beta 还是一样的，这正是我想要的。 But the moving_mean and moving _variance would become nan matrix after training, which lead to the 0.1% accuracy.但是moving_mean和moving_variance在训练后会变成nan矩阵，导致0.1%的准确率。

I don't understand why, dosen't is_taining=False force tensorflow to keep moving_mean and moving _variance unchanged?我不明白为什么，不 is_taining=False 强制 tensorflow 保持moving_mean和moving _variance不变？ How can I fix and implement this?我该如何解决和实施这个问题？

BN layer has tortured me for a so long time, Please help me！ BN层折磨了我这么久，请帮帮我！

Answer 1

Aha, I figure it out: the code block as shown below should be commented.(which is used to force Tensorflow to chagne moving_mean/moving_variance in bn layer when train_op ran, Since I don't want to chagne them in training. then it should be removed.)啊哈，我想通了：如下所示的代码块应该被注释掉。（用于强制Tensorflow在train_op运行时改变bn层的moving_mean/moving_variance，因为我不想在训练中改变它们。那么它应该删除。）

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss, name = 'train_op', var_list = var_list_to_train)

I also learned that when trapped in bugs, maybe go outside to take a break is the best way to figure out how to locate bugs and then solve it, which is a little bit like tricks in deep-learning to get out from local minimum.我还了解到，当被困在错误中时，也许在外面休息一下是找出如何定位错误然后解决它的最佳方法，这有点像深度学习中摆脱局部最小值的技巧。

当我在训练时间设置 is_training=False 时，为什么 Tensorflow BN 层中的moving_mean和moving _variance会变成nan？

问题描述

1 个解决方案

解决方案1
0 2020-06-18 05:23:02

当我在训练时间设置 is_training=False 时，为什么 Tensorflow BN 层中的moving_mean和moving _variance会变成nan？

问题描述

1 个解决方案

解决方案1 0 2020-06-18 05:23:02

解决方案1
0 2020-06-18 05:23:02