如何在TensorFlow中为卷积层正确创建批量标准化层？

Question

I was looking at the official batch normalization layer (BN) in TensorFlow however it didn't really explain how to use it for a convolutional layer. 我正在查看TensorFlow中的官方批量规范化层（BN），但它并没有真正解释如何将它用于卷积层。 Does someone know how to do this? 有人知道怎么做吗？ In particular its important that it applies and learns the same parameters per feature map (rather than per activation). 特别重要的是它应用并学习每个特征图的相同参数 （而不是每次激活）。 In other order that it applies and learn BN per filter. 按照其他顺序，它适用于每个过滤器并学习BN。

In a specific toy example say that I want to do conv2d with BN on MNIST (2D data essentially). 在一个特定的玩具示例中，我想在MNIST上使用BN进行转换（基本上是2D数据）。 Thus one could do: 因此可以这样做：

W_conv1 = weight_variable([5, 5, 1, 32]) # 5x5 filters with 32 filters
x_image = tf.reshape(x, [-1,28,28,1]) # MNIST image
conv = tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='VALID') #[?,24,24,1]
z = conv # [?,24,24,32]
z = BN(z) # [?,24,24,32], essentially only 32 different scales and shift parameters to learn, per filer application
a = tf.nn.relu(z) # [?,24,24,32]

Where z = BN(z) applies the BN to each feature created by each individual filter. 其中z = BN(z)将BN应用于由每个单独过滤器创建的每个特征。 In pseudocode: 在伪代码中：

x_patch = x[h:h+5,w:w+h,1] # patch to do convolution
z[h,w,f] = x_patch * W[:,:,f] = tf.matmul(x_patch, W[:,:,f]) # actual matrix multiplication for the convolution

we have a proper batch norm layer applied to it (in pseudocode omitting important details): 我们有一个适当的批量规范层应用于它（在伪代码中省略重要细节）：

z[h,w,f] = BN(z[h,w,f]) = scale[f] * (z[h,w,f]  - mu / sigma) + shift[f]

ie for each filter f we apply BN. 即对于每个滤波器f我们应用BN。

Answer 1

IMPORTANT: the links I provide here affect the tf.contrib.layers.batch_norm module, and not the usual tf.nn (see comments and post below) 重要提示：我在这里提供的链接会影响tf.contrib.layers.batch_norm模块，而不是通常的tf.nn （请参阅下面的评论和帖子）

I didn't test it, but the way TF expects you to use it seems to be documented in the convolution2d docstring : 我没有测试它，但TF希望你使用它的方式似乎记录在convolution2d docstring中：

def convolution2d(inputs,
              num_outputs,
              kernel_size,
              stride=1,
              padding='SAME',
              activation_fn=nn.relu,
              normalizer_fn=None,
              normalizer_params=None,
              weights_initializer=initializers.xavier_initializer(),
              weights_regularizer=None,
              biases_initializer=init_ops.zeros_initializer,
              biases_regularizer=None,
              reuse=None,
              variables_collections=None,
              outputs_collections=None,
              trainable=True,
              scope=None):
  """Adds a 2D convolution followed by an optional batch_norm layer.
  `convolution2d` creates a variable called `weights`, representing the
  convolutional kernel, that is convolved with the `inputs` to produce a
  `Tensor` of activations. If a `normalizer_fn` is provided (such as
  `batch_norm`), it is then applied. Otherwise, if `normalizer_fn` is
  None and a `biases_initializer` is provided then a `biases` variable would be
  created and added the activations.

Following this suggestion, you should add normalizer_fn='batch_norm' as a parameter to your conv2d method call. 遵循此建议，您应该将normalizer_fn='batch_norm'作为参数添加到conv2d方法调用中。

Regarding the feature map vs. activation question, my guess is that TF would add the normalization layer as a new "node" on the top of the conv2d one when building the graph, and that both of them would modify the same weights variable (in your case, the W_conv1 object). 关于特征映射与激活问题，我的猜测是TF在构建图形时将标准化层添加为conv2d顶部的新“节点”，并且它们都将修改相同的权重变量（在你的情况，W_conv1对象）。 I wouldn't describe anyway the task of the norm layer as "learning", but I'm not quite sure if I understood your point (maybe I can try to help further if you elaborate on that) 我不会将规范层的任务描述为“学习”，但我不确定我是否理解你的观点（如果你详细说明，我可以尝试进一步提供帮助）

EDIT : Taking a closer look to the body of the function confirms my guess, and also explains how the normalized_params parameter is used. 编辑：仔细查看函数的主体确认了我的猜测，并解释了如何使用normalized_params参数。 Reading from line 354 : 从354行读：

outputs = nn.conv2d(inputs, weights, [1, stride_h, stride_w, 1],
padding=padding)
if normalizer_fn:
  normalizer_params = normalizer_params or {}
  outputs = normalizer_fn(outputs, **normalizer_params)
else:
  ...etc...

we see that the outputs variable, holding the respective output of each layer, is sequentially overwritten. 我们看到outputs变量，保持每层的相应输出，被顺序覆盖。 So, if a normalizer_fn is given when building the graph, the output of nn.conv2d is going to be overwritten with the extra layer normalizer_fn . 因此，如果在构建图形时给出了normalizer_fn ，则nn.conv2d的输出将被额外的层normalizer_fn覆盖。 Here is where the **normalizer_params come to play, passed as a kwarg iterable to the given normalizer_fn . 这里是**normalizer_params发挥作用的地方，作为kwarg iterable传递给给定的normalizer_fn 。 You can find the default parameters to batch_norm here , so passing a dictionary to normalizer_params with the ones that you wish to change should do the trick, something like this: 您可以在这里找到batch_norm的默认参数，因此将字典传递给normalizer_params以及您希望更改的那些应该可以解决这个问题，如下所示：

normalizer_params = {"epsilon" : 0.314592, "center" : False}

Hope it helps! 希望能帮助到你！

Answer 2

It seems that the following example works for me: 以下示例似乎适用于我：

import numpy as np

import tensorflow as tf


normalizer_fn = None
normalizer_fn = tf.contrib.layers.batch_norm

D = 5
kernel_height = 1
kernel_width = 3
F = 4
x = tf.placeholder(tf.float32, shape=[None,1,D,1], name='x-input') #[M, 1, D, 1]
conv = tf.contrib.layers.convolution2d(inputs=x,
    num_outputs=F, # 4
    kernel_size=[kernel_height, kernel_width], # [1,3]
    stride=[1,1],
    padding='VALID',
    rate=1,
    activation_fn=tf.nn.relu,
    normalizer_fn=normalizer_fn,
    normalizer_params=None,
    weights_initializer=tf.contrib.layers.xavier_initializer(dtype=tf.float32),
    biases_initializer=tf.zeros_initializer,
    trainable=True,
    scope='cnn'
)

# syntheitc data
M = 2
X_data = np.array( [np.arange(0,5),np.arange(5,10)] )
print(X_data)
X_data = X_data.reshape(M,1,D,1)
with tf.Session() as sess:
    sess.run( tf.initialize_all_variables() )
    print( sess.run(fetches=conv, feed_dict={x:X_data}) )

Console output: 控制台输出：

$ python single_convolution.py
[[0 1 2 3 4]
 [5 6 7 8 9]]
[[[[ 1.33058071  1.33073258  1.30027914  0.        ]
   [ 0.95041472  0.95052338  0.92877126  0.        ]
   [ 0.57024884  0.57031405  0.55726254  0.        ]]]


 [[[ 0.          0.          0.          0.56916821]
   [ 0.          0.          0.          0.94861376]
   [ 0.          0.          0.          1.32805932]]]]

如何在TensorFlow中为卷积层正确创建批量标准化层？

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-09-19 04:14:23

解决方案2
1 2016-09-20 22:43:30

如何在TensorFlow中为卷积层正确创建批量标准化层？

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-09-19 04:14:23

解决方案2 1 2016-09-20 22:43:30

解决方案1
2 已采纳 2016-09-19 04:14:23

解决方案2
1 2016-09-20 22:43:30