简体   繁体   English

Keras 中的批量标准化的 output 是否取决于时期数?

[英]Is output of Batch Normalization in Keras dependent on number of epochs?

I am finding output of batchnormalization in Keras.我在 Keras 中找到了批量标准化的 output。 My model is:我的 model 是:

#Import libraries #导入库

import numpy as np
import keras
from keras import layers
from keras.layers import Input, Dense, Activation,  BatchNormalization, Flatten, Conv2D
from keras.models import Model

#Model #模型

def HappyModel3(input_shape):


    X_input = Input(input_shape, name='input_layer')
    X = BatchNormalization(axis = 1, name = 'batchnorm_layer')(X_input)
    
    X = Dense(1, activation='sigmoid', name='sigmoid_layer')(X)
    
    
    model = Model(inputs = X_input, outputs = X, name='HappyModel3')
    
    return model

    

Compiling Model |编译 Model | here number of epochs is 1这里的时代数是1

X_train=np.array([[1,1,-1],[2,1,1]])
Y_train=np.array([0,1])

happyModel_1=HappyModel3(X_train[0].shape)
happyModel_1.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_1.fit(x = X_train, y = Y_train, epochs = 1 , batch_size = 2, verbose=0 )

finding Batch Normalisation layer's output for model with epochs=1:找到 epochs=1 的 model 的批归一化层的 output:

for i in range(0, len(happyModel_1.layers)):
    
    tmp_model = Model(happyModel_1.layers[0].input, happyModel_1.layers[i].output)
    tmp_output = tmp_model.predict(X_train)
    
    if i in (0,1) :
        print(happyModel_1.layers[i].name)
        print(tmp_output.shape)
        print(tmp_output)
        print('\n')

Code Output is:代码 Output 是:

input_layer
(2, 3)
[[ 1.  1. -1.]
 [ 2.  1.  1.]]


batchnorm_layer
(2, 3)
[[ 0.99003249  0.99388224 -0.99551398]
 [ 1.99647105  0.99388224  0.9971655 ]]

We've normalized at axis=1 |我们在 axis=1 | 处进行了归一化Batch Norm Layer Output: At axis=1, 1st dimension mean is 1.5, 2nd dimension mean is 1, 3rd dimension mean is 0. Since its batch norm, I expect mean to be close to 0 for all 3 dimensions批量规范层 Output:在轴 = 1 处,第 1 维平均值为 1.5,第 2 维平均值为 1,第 3 维平均值为 0。由于其批量规范,我希望所有 3 维的平均值都接近 0

This happens when I increase epochs to 1000:当我将纪元增加到 1000 时会发生这种情况:

happyModel_2=HappyModel3(X_train[0].shape)
happyModel_2.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_2.fit(x = X_train, y = Y_train, epochs = 1000 , batch_size = 2, verbose=0 )

finding Batch Normalisation layer's output for model with epochs=1000:为 epochs=1000 的 model 找到批归一化层的 output:

for i in range(0, len(happyModel_2.layers)):
    tmp_model = Model(happyModel_2.layers[0].input, happyModel_2.layers[i].output)
    tmp_output = tmp_model.predict(X_train)
    
    if i in (0,1) :
        print(happyModel_2.layers[i].name)
        print(tmp_output.shape)
        print(tmp_output)
        print('\n')

#Code output #代码 output

input_layer
(2, 3)
[[ 1.  1. -1.]
 [ 2.  1.  1.]]


batchnorm_layer
(2, 3)
[[ -1.95576239e+00   8.08715820e-04  -1.86621261e+00]
 [  1.95795488e+00   8.08715820e-04   1.86590290e+00]]

We've normalized at axis=1 |我们在 axis=1 | 处进行了归一化Now At axis=1, batch norm layer output is: 1st dimension mean is 0, 2nd dimension mean is 0, 3rd dimension mean is 0. THIS IS AN EXPECTED OUTPUT NOW现在在轴 = 1 处,批量规范层 output 为:第 1 维平均值为 0,第 2 维平均值为 0,第 3 维平均值为 0。这是一个预期的 OUTPUT 现在

My question is: Is output of Batch Normalization in Keras dependent on number of epochs?我的问题是:Keras 中批量标准化的 output 是否取决于时期数? (Probably YES, as we do backpropagation, batch Normalization parameters will be affected by increasing number of epochs) (可能是的,当我们进行反向传播时,批量标准化参数将受到越来越多的 epoch 的影响)

The keras documentation for BatchNormalization gives an answer to your question: BatchNormalization 的BatchNormalization文档为您的问题提供了答案:

Importantly, batch normalization works differently during training and during inference.重要的是,批量标准化在训练和推理期间的工作方式不同。

What happens during training , ie when calling model.fit() ?训练期间会发生什么,即调用model.fit()时会发生什么?

During training [...], the layer normalizes its output using the mean and standard deviation of the current batch of inputs.在训练期间[...],该层使用当前批次输入的均值和标准差对其 output 进行归一化。

But what will happen during inference, ie when calling mode.predict() as in your examples?但是在推理过程中会发生什么,即在您的示例中调用mode.predict()时会发生什么?

During inference [...], the layer normalizes its output using a moving average of the mean and standard deviation of the batches it has seen during training.在推理[...] 期间,该层使用它在训练期间看到的批次的均值和标准差的移动平均值对其 output 进行归一化。 That is to say, it returns (batch - self.moving_mean) / (self.moving_var + epsilon) * gamma + beta .也就是说,它返回(batch - self.moving_mean) / (self.moving_var + epsilon) * gamma + beta

self.moving_mean and self.moving_var are non-trainable variables that are updated each time the layer in called in training mode [...]. self.moving_meanself.moving_var是不可训练的变量,每次在训练模式下调用层时都会更新它们[...]。

It's important to understand that batch normalization will calculate the statistics (mean and variance) of your whole training data during training by looking at statistics of single batches and internally updating the moving_mean and moving_variance parameters by a running average computed form the single batch statistics.重要的是要理解,批标准化将通过查看单个批次的统计数据并通过从单个批次统计数据计算的运行平均值在内部更新moving_meanmoving_variance参数来计算训练期间整个训练数据的统计数据(均值和方差)。 Therefore they're not affected by backpropagation.因此它们不受反向传播的影响。 Ideally, after your model has seen enough training examples (or did enough training epochs), moving_mean and moving_variance will correspond to the statistics of your whole training set.理想情况下,在您的 model 看到足够的训练示例(或进行足够的训练时期)之后, moving_meanmoving_variance将对应于整个训练集的统计数据。 These two parameters are then used during inference to normalize test examples.然后在推理过程中使用这两个参数来规范化测试示例。 At the start of training the two parameters will be initialized to 0 and 1. Further batch norm has two more parameters called gamma and beta, which will be updated by the optimizer and therefore depend on your loss.在训练开始时,这两个参数将被初始化为 0 和 1。进一步的批范数还有两个参数,称为 gamma 和 beta,它们将由优化器更新,因此取决于您的损失。

In essence, yes , the output of batch normalization during inference is dependent on the number of epochs you have trained your model.本质上,的,推理过程中批量标准化的 output 取决于您训练 model 的 epoch 数。 Firstly, due to changing moving averages for mean and variance and second due to learned parameters gamma and beta.首先,由于均值和方差的移动平均值的变化,其次是由于学习参数 gamma 和 beta。

For a deeper understanding of how batch normalization works and why it is needed, have a look at the original publication .要更深入地了解批量标准化的工作原理以及为什么需要它,请查看原始出版物

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM