繁体 English 中英

Keras Tensorfolow 中的 BatchNormalization 层中的属性“可训练”和“训练”有什么区别？

[英]What's the difference between attrubutes 'trainable' and 'training' in BatchNormalization layer in Keras Tensorfolow?

原文 2020-07-04 13:34:44 0 1 python/ tensorflow/ keras/ tf.keras/ batch-normalization

根据来自tensorflow的官方文档：

关于在 `BatchNormalization 层上设置 layer.trainable = False：
设置 layer.trainable = False 的意思是冻结层，即它的内部 state 在训练期间不会改变：它的可训练权重在 fit() 或 train_on_batch() 期间不会更新，它的 state 更新不会运行。
通常，这并不一定意味着该层在推理模式下运行（通常由调用层时可以传递的训练参数控制）。 “冻结状态”和“推理模式”是两个独立的概念。
但是，在 BatchNormalization 层的情况下，在该层上设置 trainable = False 意味着该层随后将在推理模式下运行（意味着它将使用移动均值和移动方差来归一化当前批次，而不是使用当前批次的均值和方差）。
此行为已在 TensorFlow 2.0 中引入，以启用 layer.trainable = False 以在 convnet 微调用例中产生最常见的预期行为。

我不太明白概念中的“冻结状态”和“推理模式”一词。 我尝试通过将trainable设置为 False 进行微调，我发现移动均值和移动方差没有更新。

所以我有以下问题：

2 属性训练和可训练有什么区别？
如果将 trainable 设置为 false，gamma 和 beta 是否会在训练过程中得到更新？
为什么微调的时候需要设置trainable为false？

1 个解决方案

What's the difference between 2 attributes training and trainable?

可训练：-（如果为真）它基本上意味着参数（层）的“可训练”权重将在反向传播中更新。

训练：-一些层在训练和推理（或测试）步骤中的表现不同。 一些示例包括 Dropout 层、Batch-Normalization 层。 所以这个属性告诉层它应该以什么方式执行。

Is gamma and beta getting updated in the training process if set trainable to false?

由于 gamma 和 beta 是 BN 层的“可训练”参数，如果 set trainable 设置为“False”，它们将不会在训练过程中更新。

Why is it necessary to set trainable to false when fine-tuning?

在进行微调时，我们首先在顶部添加我们自己的分类 FC 层，该层是随机初始化的，但我们的“预训练”model 已经针对该任务进行了校准（有点）。

打个比方，这样想。

你有一个从 0 到 10 的数轴。在这个数轴上，“0”代表完全随机的 model，而“10”代表一种完美的 model。 我们预训练的 model 大约为 5 或 6 或 7 左右，即很可能比随机的 model 更好。 我们在顶部添加的 FC 层位于“0”，因为它在开始时是随机的。

我们为预训练的 model 设置 trainable = False，以便我们可以使 FC 层快速达到预训练的 model 的水平，即具有更高的学习率。 如果我们不为预训练的 model 设置 trainable = False 并使用更高的学习率，那么它将造成严重破坏。

因此，最初，我们为预训练的 model 设置更高的学习率和 trainable = False 并训练 FC 层。 之后，我们解冻我们预训练的 model 并使用非常低的学习率来达到我们的目的。

如果需要，请自由要求更多说明，如果您觉得有帮助，请点赞。

设置Keras模型可训练与使每层可训练之间有什么区别

[英]What is the difference between setting a Keras model trainable vs making each layer trainable

Keras 的 BatchNormalization 和 PyTorch 的 BatchNorm2d 的区别？

[英]Difference between Keras' BatchNormalization and PyTorch's BatchNorm2d?

任何keras层中的dropout层和dropout参数有什么区别

[英]What is the difference between dropout layer and dropout parameter in any keras layer

在tensorflow中，可训练和停止梯度之间的区别是什么

[英]In tensorflow what is the difference between trainable and stop gradient

具有可训练标量的自定义 Keras 层

[英]Custom Keras Layer with Trainable Scalars

Keras BatchNormalization 层：InternalError：cuDNN 启动失败

[英]Keras BatchNormalization layer : InternalError: cuDNN launch failure

keras.Model 和 keras.engine.training.Model 有什么区别？

[英]What is difference between keras.Model and keras.engine.training.Model?

使用 softmax 作为 tf.keras 中的连续层和使用 softmax 作为密集层的激活函数有什么区别？

[英]what is the difference between using softmax as a sequential layer in tf.keras and softmax as an activation function for a dense layer?

Keras Dense 层和 Pytorch 的 nn.linear 层有区别吗？

[英]Is there a difference between Keras Dense layer and Pytorch's nn.linear layer?

在Keras有可能有不可训练的层吗？

[英]Is it possible to have non-trainable layer in Keras?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 设置Keras模型可训练与使每层可训练之间有什么区别 Keras 的 BatchNormalization 和 PyTorch 的 BatchNorm2d 的区别？任何keras层中的dropout层和dropout参数有什么区别在tensorflow中，可训练和停止梯度之间的区别是什么具有可训练标量的自定义 Keras 层 Keras BatchNormalization 层：InternalError：cuDNN 启动失败 keras.Model 和 keras.engine.training.Model 有什么区别？使用 softmax 作为 tf.keras 中的连续层和使用 softmax 作为密集层的激活函数有什么区别？ Keras Dense 层和 Pytorch 的 nn.linear 层有区别吗？在Keras有可能有不可训练的层吗？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM