简体繁体 English

Pytorch中model.train()和model.eval()模式下BatchNorm层反向传播的区别？

[英]The differences of BatchNorm layer backpropagation at mode of model.train() and model.eval() in Pytorch?

原文 2021-04-04 02:30:05 8 1 python/ pytorch/ torch/ batchnorm

I test the gradient of BatchNorm layer for two mode: model.train() and model.eval().我针对两种模式测试 BatchNorm 层的梯度：model.train() 和 model.eval()。 I bulid a simple CNN network NetWork and input the same input X to the network at model.train() mode and model.eval() mode.我构建了一个简单的 CNN 网络NetWork ，并在 model.train() 模式和 model.eval() 模式下将相同的输入 X 输入网络。 I know the differences of model.train() and model.eval() of BatchNorm layer.我知道 BatchNorm 层的 model.train() 和 model.eval() 的区别。 I have replaced the mean and the var of Batchnorm layer in mode of model.eval() as the values in mode of model.train().我已将 model.eval() 模式下的 Batchnorm 层的均值和 var 替换为 model.train() 模式下的值。 Therefore both outputs and parameters of two mode are the same.因此，两种模式的输出和参数都是相同的。 output of two mode parameters of two mode However, when I calculate the gradients of each parameters, I found the gradients of the layer before BatchNorm layer is different, although their parameters and the loss is same. output of two mode parameters of two mode但是，当我计算每个参数的梯度时，我发现在 BatchNorm 层之前的层的梯度是不同的，尽管它们的参数和损失是相同的。 different gradients of the layer before BatchNorm I think it's because the difference of BatchNorm layer backpropagation at model.train() and model.eval(), but I don't understrand the detail of it. BatchNorm 之前层的不同梯度我认为这是因为 BatchNorm 层反向传播在 model.train() 和 model.eval() 处的差异，但我不明白它的细节。 Does anyone know?有人知道吗？ Thank you so much.太感谢了。

1 个解决方案

When the mode is.train(), the batchnorm layer calculate the batchwise mean and variance of the input and uses it to normalize the inputs.当模式为.train() 时，batchnorm 层计算输入的批量均值和方差，并使用它对输入进行归一化。 This mean and variance is also used to update the Moving Average Mean and Variance.此均值和方差也用于更新移动平均均值和方差。

When the mode is.eval(), the batchnorm layer doesn't calculate the mean and variance of the input, but uses the pre-computed moving average mean and variance during training stage.当模式为.eval()时，batchnorm层不计算输入的均值和方差，而是在训练阶段使用预先计算的移动平均均值和方差。

This way, your predictions won't change on a single image during testing, when other samples in the batch changes.这样，当批次中的其他样本发生变化时，您的预测在测试期间不会在单个图像上发生变化。

https://pytorch.org/docs/stable/_modules/torch/nn/modules/batchnorm.html#BatchNorm2d https://pytorch.org/docs/stable/_modules/torch/nn/modules/batchnorm.html#BatchNorm2d

In the above code, running mean and running variance is the moving average mean and variance of the input featuremap of batch norm layer that they were calculated during training stage.在上面的代码中，运行均值和运行方差是在训练阶段计算的批范数层输入特征图的移动平均均值和方差。