I am trying to figure out what exactly the batch normalization in Keras does. Right now I have the following code.
for i in range(8):
c = Convolution2D(128, 3, 3, border_mode = 'same', init = 'he_normal')(c)
c = LeakyReLU()(c)
c = Convolution2D(128, 3, 3, border_mode = 'same', init = 'he_normal')(c)
c = LeakyReLU()(c)
c = Convolution2D(128, 3, 3, border_mode = 'same', init = 'he_normal')(c)
c = LeakyReLU()(c)
c = merge([c, x], mode = 'sum')
c = BatchNormalization(mode = 1)(c)
x = c
I set the batch norm mode to 1 which according to the Keras documentation 1: sample-wise normalization. This mode assumes a 2D input.
1: sample-wise normalization. This mode assumes a 2D input.
What I think this should be doing is just normalizing each sample in the batch independently of every other sample. However when I look at the source code for the call function I see the following.
elif self.mode == 1:
# sample-wise normalization
m = K.mean(x, axis=-1, keepdims=True)
std = K.std(x, axis=-1, keepdims=True)
x_normed = (x - m) / (std + self.epsilon)
out = self.gamma * x_normed + self.beta
In this it is just computing the mean over all of x
which in my case is (BATCH_SIZE, 128, 56, 56)
I think. I thought it was supposed to normalize independent of the other samples in the batch when in mode 1. So shouldn't axis = 1
? Also what does "assumes a 2D input" mean in the documentation?
In this it is just computing the mean over all of x which in my case is
(BATCH_SIZE, 128, 56, 56)
I think.
By doing this you are already violating the contract of that layer. That is not a 2 dimensional but a 4 dimensional input.
I thought it was supposed to normalize independent of the other samples in the batch when in mode 1
It does. K.mean(..., axis=-1)
is reducing axis -1 which is synonymous for the last axis of the input. So assuming an input shape of (batchsz, features)
, axis -1 will be the features
axis.
Since K.mean
is very similar to numpy.mean
you can test this yourself:
>>> x = [[1,2,3],[4,5,6]]
>>> x
array([[1, 2, 3],
[4, 5, 6]])
>>> np.mean(x, axis=-1)
array([ 2., 5.])
You can see that the features got reduced per sample in the batch.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.