简体繁体 English

Keras Tensorfolow 中的 BatchNormalization 层中的属性“可训练”和“训练”有什么区别？

[英]What's the difference between attrubutes 'trainable' and 'training' in BatchNormalization layer in Keras Tensorfolow?

原文 2020-07-04 13:34:44 9 1 python/ tensorflow/ keras/ tf.keras/ batch-normalization

According to the official documents from tensorflow:根据来自tensorflow的官方文档：

About setting layer.trainable = False on a `BatchNormalization layer:关于在 `BatchNormalization 层上设置 layer.trainable = False：
The meaning of setting layer.trainable = False is to freeze the layer, ie its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.设置 layer.trainable = False 的意思是冻结层，即它的内部 state 在训练期间不会改变：它的可训练权重在 fit() 或 train_on_batch() 期间不会更新，它的 state 更新不会运行。
Usually, this does not necessarily mean that the layer is run in inference mode (which is normally controlled by the training argument that can be passed when calling a layer).通常，这并不一定意味着该层在推理模式下运行（通常由调用层时可以传递的训练参数控制）。 "Frozen state" and "inference mode" are two separate concepts. “冻结状态”和“推理模式”是两个独立的概念。
However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).但是，在 BatchNormalization 层的情况下，在该层上设置 trainable = False 意味着该层随后将在推理模式下运行（意味着它将使用移动均值和移动方差来归一化当前批次，而不是使用当前批次的均值和方差）。
This behavior has been introduced in TensorFlow 2.0, in order to enable layer.trainable = False to produce the most commonly expected behavior in the convnet fine-tuning use case.此行为已在 TensorFlow 2.0 中引入，以启用 layer.trainable = False 以在 convnet 微调用例中产生最常见的预期行为。

I don't quite understand the term 'frozen state' and 'inference mode' here in the concept.我不太明白概念中的“冻结状态”和“推理模式”一词。 I tried fine-tuning by setting the trainable to False, and I found that the moving mean and moving variance are not being updated.我尝试通过将trainable设置为 False 进行微调，我发现移动均值和移动方差没有更新。

So I have the following questions:所以我有以下问题：

What's the difference between 2 attributes training and trainable? 2 属性训练和可训练有什么区别？
Is gamma and beta getting updated in the training process if set trainable to false?如果将 trainable 设置为 false，gamma 和 beta 是否会在训练过程中得到更新？
Why is it necessary to set trainable to false when fine-tuning?为什么微调的时候需要设置trainable为false？

1 个解决方案

What's the difference between 2 attributes training and trainable?

trainable:- ( If True ) It basically implies that the "trainable" weights of the parameter( of the layer ) will be updated in backpropagation.可训练：-（如果为真）它基本上意味着参数（层）的“可训练”权重将在反向传播中更新。

training:- Some layers perform differently at training and inference( or testing ) steps.训练：-一些层在训练和推理（或测试）步骤中的表现不同。 Some examples include Dropout Layer, Batch-Normalization layers.一些示例包括 Dropout 层、Batch-Normalization 层。 So this attribute tells the layer that in what manner it should perform.所以这个属性告诉层它应该以什么方式执行。

Is gamma and beta getting updated in the training process if set trainable to false?

Since gamma and beta are "trainable" parameters of the BN Layer, they will NOT be updated in the training process if set trainable is set to "False".由于 gamma 和 beta 是 BN 层的“可训练”参数，如果 set trainable 设置为“False”，它们将不会在训练过程中更新。

Why is it necessary to set trainable to false when fine-tuning?

When doing fine-tuning, we first add our own classification FC layer at the top which is randomly initialized but our "pre-trained" model is already calibrated( a bit ) for the task.在进行微调时，我们首先在顶部添加我们自己的分类 FC 层，该层是随机初始化的，但我们的“预训练”model 已经针对该任务进行了校准（有点）。

As an analogy, think like this.打个比方，这样想。

You have a number line from 0 - 10. On this number line, '0' represents a completely randomized model whereas '10' represents a kind of perfect model.你有一个从 0 到 10 的数轴。在这个数轴上，“0”代表完全随机的 model，而“10”代表一种完美的 model。 Our pre-trained model is somewhere around 5 or maybe 6 or maybe 7 ie most probably better than a random model.我们预训练的 model 大约为 5 或 6 或 7 左右，即很可能比随机的 model 更好。 The FC Layer we have added at the top is at '0' as it is randomized at the start.我们在顶部添加的 FC 层位于“0”，因为它在开始时是随机的。

We set trainable = False for the pre-trained model so that we can make the FC Layer reach the level of the pre-trained model rapidly ie with a higher learning rate.我们为预训练的 model 设置 trainable = False，以便我们可以使 FC 层快速达到预训练的 model 的水平，即具有更高的学习率。 If we don't set trainable = False for the pre-trained model and use a higher learning rate then it will wreak havoc.如果我们不为预训练的 model 设置 trainable = False 并使用更高的学习率，那么它将造成严重破坏。

So initially, we set a higher learning rate and trainable = False for the pre-trained model and train the FC layer.因此，最初，我们为预训练的 model 设置更高的学习率和 trainable = False 并训练 FC 层。 After that, we unfreeze our pre-trained model and use a very low learning rate to serve our purpose.之后，我们解冻我们预训练的 model 并使用非常低的学习率来达到我们的目的。

Do freely ask for more clarification if required and upvote if you find it helpful.如果需要，请自由要求更多说明，如果您觉得有帮助，请点赞。