简体   繁体   English

如何使用批次归一化来标准化批次尺寸?

[英]How can I use Batch Normalization to normalize the batch dimension?

I want to use the Batchnormalization to normalize the batch dimension, but naturally the batch dimension in keras is none. 我想使用Batchnormalization来标准化批次尺寸,但是自然地,keras中的批次尺寸是没有的。 So what can i do. 那我该怎么办。

The example of keras show the axis is -1 for conv2d, which means the channel dimension. keras示例显示conv2d的轴为-1,这表示通道尺寸。

keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None)

axis: Integer, the axis that should be normalized (typically the features axis). axis:整数,应归一化的轴(通常是要素轴)。 For instance, after a Conv2D layer with data_format="channels_first", set axis=1 in BatchNormalization. 例如,在具有data_format =“ channels_first”的Conv2D图层之后,在BatchNormalization中设置axis = 1。

It simply makes no sense to apply the BN layer to the batch axis. 将BN层应用于批处理轴根本没有意义。

Why? 为什么? If this is plausible, you will end up learn BN parameters in terms of several trainable vectors of batch_size dimension. 如果这是可行的,您将最终根据一些batch_size维数的可训练向量学习BN参数。 OK. 好。 So what. 所以呢。 You can still train such a model without seeing an error message. 您仍然可以训练这样的模型而不会看到错误消息。

But how about testing? 但是测试呢? The above BN simply implies that you have to do inference with the exact same batch_size as in training. 上面的BN仅意味着您必须使用与训练中完全相同的batch_size进行推断。 Otherwise, the tensor operation will be ill-defined and you will see an error. 否则,将无法正确定义张量操作,并且会看到错误。

More importantly, the BN that you proposed means to treat samples differently according to their relative positions in a batch. 更重要的是,您提出的BN意味着根据批次中的相对位置对样品进行不同的处理。 Because you will always normalize those samples that appear at the 1st place in a batch with one set of parameters, while using another set of parameters for those samples that appear at a different place. 因为您将始终使用一组参数对一批中出现在第一位的样品进行归一化,而对于出现在不同位置的那些样品则使用另一组参数。 Again, you may say so what. 再说一遍,你可能会这么说。
However, the fact is that you will have to shuffle your training samples anyway, implying that such relative positions in a batch is completely meaningless. 但是,事实是您无论如何都要洗牌,这意味着批次中的此类相对位置完全没有意义。 In other words, learning something about these relative positions is doomed to be failed. 换句话说,学习有关这些相对位置的东西注定会失败。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM