简体   繁体   中英

Keras BatchNormalization, What exactly is sample wise normalization?

I am trying to figure out what exactly the batch normalization in Keras does. Right now I have the following code.

for i in range(8):
    c = Convolution2D(128, 3, 3, border_mode = 'same', init = 'he_normal')(c)
    c = LeakyReLU()(c)
    c = Convolution2D(128, 3, 3, border_mode = 'same', init = 'he_normal')(c)
    c = LeakyReLU()(c)
    c = Convolution2D(128, 3, 3, border_mode = 'same', init = 'he_normal')(c)
    c = LeakyReLU()(c)
    c = merge([c, x], mode = 'sum')
    c = BatchNormalization(mode = 1)(c)
    x = c

I set the batch norm mode to 1 which according to the Keras documentation 1: sample-wise normalization. This mode assumes a 2D input. 1: sample-wise normalization. This mode assumes a 2D input.

What I think this should be doing is just normalizing each sample in the batch independently of every other sample. However when I look at the source code for the call function I see the following.

    elif self.mode == 1:
        # sample-wise normalization
        m = K.mean(x, axis=-1, keepdims=True)
        std = K.std(x, axis=-1, keepdims=True)
        x_normed = (x - m) / (std + self.epsilon)
        out = self.gamma * x_normed + self.beta

In this it is just computing the mean over all of x which in my case is (BATCH_SIZE, 128, 56, 56) I think. I thought it was supposed to normalize independent of the other samples in the batch when in mode 1. So shouldn't axis = 1 ? Also what does "assumes a 2D input" mean in the documentation?

In this it is just computing the mean over all of x which in my case is (BATCH_SIZE, 128, 56, 56) I think.

By doing this you are already violating the contract of that layer. That is not a 2 dimensional but a 4 dimensional input.

I thought it was supposed to normalize independent of the other samples in the batch when in mode 1

It does. K.mean(..., axis=-1) is reducing axis -1 which is synonymous for the last axis of the input. So assuming an input shape of (batchsz, features) , axis -1 will be the features axis.

Since K.mean is very similar to numpy.mean you can test this yourself:

>>> x = [[1,2,3],[4,5,6]]
>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>> np.mean(x, axis=-1)
array([ 2.,  5.])

You can see that the features got reduced per sample in the batch.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM