I was looking at the official batch normalization layer (BN) in TensorFlow however it didn't really explain how to use it for a convolutional layer. Does someone know how to do this? In particular its important that it applies and learns the same parameters per feature map (rather than per activation). In other order that it applies and learn BN per filter.
In a specific toy example say that I want to do conv2d with BN on MNIST (2D data essentially). Thus one could do:
W_conv1 = weight_variable([5, 5, 1, 32]) # 5x5 filters with 32 filters
x_image = tf.reshape(x, [-1,28,28,1]) # MNIST image
conv = tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='VALID') #[?,24,24,1]
z = conv # [?,24,24,32]
z = BN(z) # [?,24,24,32], essentially only 32 different scales and shift parameters to learn, per filer application
a = tf.nn.relu(z) # [?,24,24,32]
Where z = BN(z)
applies the BN to each feature created by each individual filter. In pseudocode:
x_patch = x[h:h+5,w:w+h,1] # patch to do convolution
z[h,w,f] = x_patch * W[:,:,f] = tf.matmul(x_patch, W[:,:,f]) # actual matrix multiplication for the convolution
we have a proper batch norm layer applied to it (in pseudocode omitting important details):
z[h,w,f] = BN(z[h,w,f]) = scale[f] * (z[h,w,f] - mu / sigma) + shift[f]
ie for each filter f
we apply BN.
IMPORTANT: the links I provide here affect the tf.contrib.layers.batch_norm
module, and not the usual tf.nn
(see comments and post below)
I didn't test it, but the way TF expects you to use it seems to be documented in the convolution2d
docstring :
def convolution2d(inputs,
num_outputs,
kernel_size,
stride=1,
padding='SAME',
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer,
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None):
"""Adds a 2D convolution followed by an optional batch_norm layer.
`convolution2d` creates a variable called `weights`, representing the
convolutional kernel, that is convolved with the `inputs` to produce a
`Tensor` of activations. If a `normalizer_fn` is provided (such as
`batch_norm`), it is then applied. Otherwise, if `normalizer_fn` is
None and a `biases_initializer` is provided then a `biases` variable would be
created and added the activations.
Following this suggestion, you should add normalizer_fn='batch_norm'
as a parameter to your conv2d method call.
Regarding the feature map vs. activation question, my guess is that TF would add the normalization layer as a new "node" on the top of the conv2d one when building the graph, and that both of them would modify the same weights variable (in your case, the W_conv1 object). I wouldn't describe anyway the task of the norm layer as "learning", but I'm not quite sure if I understood your point (maybe I can try to help further if you elaborate on that)
EDIT : Taking a closer look to the body of the function confirms my guess, and also explains how the normalized_params
parameter is used. Reading from line 354 :
outputs = nn.conv2d(inputs, weights, [1, stride_h, stride_w, 1],
padding=padding)
if normalizer_fn:
normalizer_params = normalizer_params or {}
outputs = normalizer_fn(outputs, **normalizer_params)
else:
...etc...
we see that the outputs
variable, holding the respective output of each layer, is sequentially overwritten. So, if a normalizer_fn
is given when building the graph, the output of nn.conv2d
is going to be overwritten with the extra layer normalizer_fn
. Here is where the **normalizer_params
come to play, passed as a kwarg iterable to the given normalizer_fn
. You can find the default parameters to batch_norm
here , so passing a dictionary to normalizer_params with the ones that you wish to change should do the trick, something like this:
normalizer_params = {"epsilon" : 0.314592, "center" : False}
Hope it helps!
It seems that the following example works for me:
import numpy as np
import tensorflow as tf
normalizer_fn = None
normalizer_fn = tf.contrib.layers.batch_norm
D = 5
kernel_height = 1
kernel_width = 3
F = 4
x = tf.placeholder(tf.float32, shape=[None,1,D,1], name='x-input') #[M, 1, D, 1]
conv = tf.contrib.layers.convolution2d(inputs=x,
num_outputs=F, # 4
kernel_size=[kernel_height, kernel_width], # [1,3]
stride=[1,1],
padding='VALID',
rate=1,
activation_fn=tf.nn.relu,
normalizer_fn=normalizer_fn,
normalizer_params=None,
weights_initializer=tf.contrib.layers.xavier_initializer(dtype=tf.float32),
biases_initializer=tf.zeros_initializer,
trainable=True,
scope='cnn'
)
# syntheitc data
M = 2
X_data = np.array( [np.arange(0,5),np.arange(5,10)] )
print(X_data)
X_data = X_data.reshape(M,1,D,1)
with tf.Session() as sess:
sess.run( tf.initialize_all_variables() )
print( sess.run(fetches=conv, feed_dict={x:X_data}) )
Console output:
$ python single_convolution.py
[[0 1 2 3 4]
[5 6 7 8 9]]
[[[[ 1.33058071 1.33073258 1.30027914 0. ]
[ 0.95041472 0.95052338 0.92877126 0. ]
[ 0.57024884 0.57031405 0.55726254 0. ]]]
[[[ 0. 0. 0. 0.56916821]
[ 0. 0. 0. 0.94861376]
[ 0. 0. 0. 1.32805932]]]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.