如何在TensorFlow中使用官方批量标准化层？

Question

I was trying to use batch normalization to train my Neural Networks using TensorFlow but it was unclear to me how to use the official layer implementation of Batch Normalization (note this is different from the one from the API ). 我试图使用批量标准化来使用TensorFlow训练我的神经网络，但我不清楚如何使用批量标准化的官方层实现（注意这与API中的不同）。

After some painful digging on the their github issues it seems that one needs a tf.cond to use it properly and also a 'resue=True' flag so that the BN shift and scale variables are properly reused. 经过一些痛苦的挖掘他们的github问题后，似乎需要一个tf.cond来正确使用它并且还有一个'resue = True'标志，以便BN移位和缩放变量被正确地重用。 After figuring that out I provided a small description of how I believe is the right way to use it here . 在弄清楚之后，我提供了一个小的描述，我认为这是在这里使用它的正确方法。

Now I have written a short script to test it (only a single layer and a ReLu, hard to make it smaller than this). 现在我已经编写了一个简短的脚本来测试它（只有一个层和一个ReLu，很难让它比这个小）。 However, I am not 100% sure how to test it. 但是，我不是百分百确定如何测试它。 Right now my code runs with no error messages but returns NaNs unexpectedly. 现在我的代码运行时没有错误消息，但意外返回NaNs 。 Which lowers my confidence that the code I gave in the other post might be right. 这降低了我对我在其他帖子中提供的代码可能正确的信心。 Or maybe the network I have is weird. 或许我所拥有的网络很奇怪。 Either way, does someone know whats wrong? 无论哪种方式，有人知道什么是错的？ Here is the code: 这是代码：

import tensorflow as tf
# download and install the MNIST data automatically
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm

def batch_norm_layer(x,train_phase,scope_bn):
    bn_train = batch_norm(x, decay=0.999, center=True, scale=True,
    is_training=True,
    reuse=None, # is this right?
    trainable=True,
    scope=scope_bn)

    bn_inference = batch_norm(x, decay=0.999, center=True, scale=True,
    is_training=False,
    reuse=True, # is this right?
    trainable=True,
    scope=scope_bn)

    z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
    return z

def get_NN_layer(x, input_dim, output_dim, scope, train_phase):
    with tf.name_scope(scope+'vars'):
        W = tf.Variable(tf.truncated_normal(shape=[input_dim, output_dim], mean=0.0, stddev=0.1))
        b = tf.Variable(tf.constant(0.1, shape=[output_dim]))
    with tf.name_scope(scope+'Z'):
        z = tf.matmul(x,W) + b
    with tf.name_scope(scope+'BN'):
        if train_phase is not None:
            z = batch_norm_layer(z,train_phase,scope+'BN_unit')
    with tf.name_scope(scope+'A'):
        a = tf.nn.relu(z) # (M x D1) = (M x D) * (D x D1)
    return a

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# placeholder for data
x = tf.placeholder(tf.float32, [None, 784])
# placeholder that turns BN during training or off during inference
train_phase = tf.placeholder(tf.bool, name='phase_train')
# variables for parameters
hiden_units = 25
layer1 = get_NN_layer(x, input_dim=784, output_dim=hiden_units, scope='layer1', train_phase=train_phase)
# create model
W_final = tf.Variable(tf.truncated_normal(shape=[hiden_units, 10], mean=0.0, stddev=0.1))
b_final = tf.Variable(tf.constant(0.1, shape=[10]))
y = tf.nn.softmax(tf.matmul(layer1, W_final) + b_final)

### training
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean( -tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]) )
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())
    steps = 3000
    for iter_step in xrange(steps):
        #feed_dict_batch = get_batch_feed(X_train, Y_train, M, phase_train)
        batch_xs, batch_ys = mnist.train.next_batch(100)
        # Collect model statistics
        if iter_step%1000 == 0:
            batch_xstrain, batch_xstrain = batch_xs, batch_ys #simualtes train data
            batch_xcv, batch_ycv = mnist.test.next_batch(5000) #simualtes CV data
            batch_xtest, batch_ytest = mnist.test.next_batch(5000) #simualtes test data
            # do inference
            train_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xs, y_:batch_ys, train_phase: False})
            cv_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xcv, y_:batch_ycv, train_phase: False})
            test_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xtest, y_:batch_ytest, train_phase: False})

            def do_stuff_with_errors(*args):
                print args
            do_stuff_with_errors(train_error, cv_error, test_error)
        # Run Train Step
        sess.run(fetches=train_step, feed_dict={x: batch_xs, y_:batch_ys, train_phase: True})
    # list of booleans indicating correct predictions
    correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
    # accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels, train_phase: False}))

when I run it I get: 当我运行它时，我得到：

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
(2.3474066, 2.3498712, 2.3461707)
(0.49414295, 0.88536006, 0.91152304)
(0.51632041, 0.393666, nan)
0.9296

it used to be all the last ones were nan and now only a few of them. 它曾经是最后一个是南，现在只有少数几个。 Is everything fine or am I paranoic? 一切都好还是我是偏执狂？

Answer 1

I am not sure if this will solve your problem, the documentation for BatchNorm is not quite easy-to-use/informative, so here is a short recap on how to use simple BatchNorm: 我不确定这是否能解决您的问题，BatchNorm的文档不是很容易使用/提供信息，所以这里简单回顾一下如何使用简单的BatchNorm：

First of all, you define your BatchNorm layer. 首先，您定义BatchNorm图层。 If you want to use it after an affine/fully-connected layer, you do this (just an example, order can be different/as you desire): 如果你想在仿射/完全连接的层之后使用它，你可以这样做（只是一个例子，订单可以根据需要不同）：

...
inputs = tf.matmul(inputs, W) + b
inputs = tf.layers.batch_normalization(inputs, training=is_training)
inputs = tf.nn.relu(inputs)
...

The function tf.layers.batch_normalization calls variable-initializers. 函数tf.layers.batch_normalization调用变量初始值设定项。 These are internal-variables and need a special scope to be called, which is in the tf.GraphKeys.UPDATE_OPS . 这些是内部变量，需要调用一个特殊的范围，它位于tf.GraphKeys.UPDATE_OPS 。 As such, you must call your optimizer function as follows (after all layers have been defined!): 因此，您必须按如下方式调用优化程序函数（在定义了所有图层之后！）：

...
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
    trainer = tf.train.AdamOptimizer() 
    updateModel = trainer.minimize(loss, global_step=global_step)
...

You can read more about it here . 你可以在这里阅读更多相关信息。 I know it's a little late to answer your question, but it might help other people coming across BatchNorm problems in tensorflow! 我知道回答你的问题有点晚了，但它可能会帮助其他人在tensorflow中遇到BatchNorm问题！ :) :)

Answer 2

training =tf.placeholder(tf.bool, name = 'training')

lr_holder = tf.placeholder(tf.float32, [], name='learning_rate')
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
        optimizer =  tf.train.AdamOptimizer(learning_rate = lr).minimize(cost)

when defining the layers, you need to use the placeholder 'training' 在定义图层时，您需要使用占位符“培训”

batchNormal_layer = tf.layers.batch_normalization(pre_batchNormal_layer, training=training)

如何在TensorFlow中使用官方批量标准化层？

问题描述

2 个解决方案

解决方案1
8 2017-05-04 12:18:22

解决方案2
0 2017-12-21 17:56:46

如何在TensorFlow中使用官方批量标准化层？

问题描述

2 个解决方案

解决方案1 8 2017-05-04 12:18:22

解决方案2 0 2017-12-21 17:56:46

解决方案1
8 2017-05-04 12:18:22

解决方案2
0 2017-12-21 17:56:46