在keras中对批处理归一化进行微调

Question

I have trained a model successfully over 100000 samples, which performs well both in train set and test set. 我已经成功训练了超过100000个样本的模型，该模型在训练集和测试集上均表现出色。 Then, I tried to fine-tune it over one particular sample (one of the 100000 samples) and use the trained weights as the initialization. 然后，我尝试在一个特定样本（100000个样本之一）中对其进行微调，并将训练后的权重用作初始化。

But the result is a little strange and I believe it is caused by the batch normalization layer. 但是结果有点奇怪，我相信这是由批处理规范化层引起的。 Specifically, my code can be listed as follows: 具体来说，我的代码可以列出如下：

model = mymodel()
model.load_weights('./pre_trained.h5') #start from history
rate = model.evaluate(x, y)
print(rate)
checkpoint = tf.keras.callbacks.ModelCheckpoint('./trained.h5', monitor='loss',
        verbose=0, save_best_only=True, mode='min',save_weights_only=True)
model.fit(x, y,validation_data=[x, y], epochs=5, verbose=2, callbacks=[checkpoint])

model.load_weights('./trained.h5') rate = model.evaluate(x, y) print(rate) model.load_weights（'./ trained.h5'）rate = model.evaluate（x，y）print（rate）

mymodel is a self-define function to generate my model, consists of Dense and Batch normalization. mymodel是用于生成模型的自定义函数，由Dense和Batch规范化组成。 x,y is the input and label of one particular sample. x，y是一个特定样本的输入和标签。 I want to further optimize the loss of the sample. 我想进一步优化样本的损失。 However, the results is strange as: 但是，结果很奇怪，因为：

 1/1 [==============================] - 0s 209ms/step
-6.087581634521484
Train on 1 samples, validate on 1 samples
Epoch 1/200
 - 1s - loss: -2.7749e-01 - val_loss: -6.0876e+00
Epoch 2/200
 - 0s - loss: -2.8791e-01 - val_loss: -6.0876e+00
Epoch 3/200
 - 0s - loss: -3.0012e-01 - val_loss: -6.0876e+00
Epoch 4/200
 - 0s - loss: -3.1325e-01 - val_loss: -6.0876e+00

As it shown, first the model.evaluate works well as the loss result ( -6.087581634521484) is close to the performance of loaded trained model. 如图所示，首先，由于损失结果（-6.087581634521484）接近已加载训练模型的性能，因此model.evaluate效果良好。 But the loss over the train set (actually same as the validation set in model.fit() ) is strange. 但是训练集上的损失（实际上与model.fit()的验证集相同）是奇怪的。 The val_loss is normal, similar to the results of model.evaluate in the first line. val_loss很正常，类似于第一行中的model.evaluate结果。 So I'm really puzzled that why still a large difference between the train loss and the inference loss (the train loss is worse), as the train sample and the validation sample is the same one, I think the result should also be the same, or at least very close.I suspect the problem is caused by the BN layer, due to the large difference between train and inference. 所以我很困惑，为什么火车损失和推理损失之间仍然有很大的差异（火车损失更糟），因为火车样本和验证样本是相同的，我认为结果也应该相同，或至少非常接近。我怀疑问题是由BN层引起的，原因是训练与推理之间的差异很大。 However, I have already set the trainable = False of the BN layer after loading the pre-trained weights and before the model.fit , but the problem is not solved. 但是，我已经在加载预训练的权重之后并且在model.fit之前设置了BN层的trainable = False ，但是问题尚未解决。

out = tf.keras.layers.BatchNormalization(trainable=False)(out)

I still doubt the BN layer, and wonder if set trainable=False is enough to keep the parameters of BN same. 我仍然对BN层表示怀疑，并想知道是否将set trainable=False设置为足以使BN的参数保持不变。

Can anyone give me some advise? 谁能给我一些建议？ Thanks a lot for your help in advance. 非常感谢您的提前帮助。 Sorry for my English, but I tried my best to explain my problem. 对不起，我的英语，但是我尽力解释了我的问题。

Answer 1

A little awkward, I have found a strange way to solve the problem in another question Keras: Accuracy Drops While Finetuning Inception 有点尴尬，我发现了另一种解决问题的奇怪方法Keras：微调初始时精度下降

Actually, I think it's not the enough answer, but when I add 实际上，我认为这还不够，但是当我添加

 tf.keras.backend.set_learning_phase(1)

before the model.compile() . 在model.compile()之前。 The result became much normal, although still exists some problem: 结果变得很正常，尽管仍然存在一些问题：

1/1 [==============================] - 0s 246ms/step
-6.087581634521484
Train on 1 samples, validate on 1 samples
Epoch 1/10
 - 1s - loss: -6.0876e+00 - val_loss: -6.0893e+00
Epoch 2/10
 - 0s - loss: -6.0893e+00 - val_loss: -6.0948e+00
Epoch 3/10
 - 0s - loss: -6.0948e+00 - val_loss: -6.0903e+00
Epoch 4/10
 - 0s - loss: -6.0903e+00 - val_loss: -6.0927e+00

It is amazing and what I want, but I still puzzled about the problem. 这是惊人的，也是我想要的，但是我仍然对这个问题感到困惑。 First, why it works, what does tf.keras.backend.set_learning_phase(1) do? 首先，为什么它起作用， tf.keras.backend.set_learning_phase(1)什么作用？ In addition, I set the layers.trainbale=True , and why the BN layer works normally in this case? 另外，我设置了layers.trainbale=True ，为什么在这种情况下BN层可以正常工作？ Then, why the loss and the val_loss still has a very small difference? 那么，为什么损失和val_loss仍然有很小的差异？ As the sample is the same, what cause the phenomenon? 由于样品相同，是什么原因引起的？ Finally, I find that whether I use tf.keras.backend.set_learning_phase(0) or tf.keras.backend.set_learning_phase(1) , the result is similar and normal. 最后，我发现无论我使用tf.keras.backend.set_learning_phase(0)还是tf.keras.backend.set_learning_phase(1) ，结果都是相似且正常的。 Following is the result of tf.keras.backend.set_learning_phase(0) : 以下是tf.keras.backend.set_learning_phase(0)的结果：

1/1 [==============================] - 0s 242ms/step
-6.087581634521484
Train on 1 samples, validate on 1 samples
Epoch 1/10
 - 1s - loss: -6.0876e+00 - val_loss: -6.0775e+00
Epoch 2/10
 - 0s - loss: -6.0775e+00 - val_loss: -6.0925e+00
Epoch 3/10
 - 0s - loss: -6.0925e+00 - val_loss: -6.0908e+00
Epoch 4/10
 - 0s - loss: -6.0908e+00 - val_loss: -6.0883e+00

It is a little different from tf.keras.backend.set_learning_phase(1) , which also wait for a proper explanation. 它与tf.keras.backend.set_learning_phase(1)稍有不同，后者也需要适当的解释。

I'm new to deep learning and Keras, and I benefit a lot from Stack overflow. 我是深度学习和Keras的新手，而我从Stack溢出中受益匪浅。 Both for my knowledge and my English. 就我的知识和英语而言。

Thanks for help in advance. 预先感谢您的帮助。

Answer 2

我在这里找到了可能的解释： https : //github.com/keras-team/keras/pull/9965和这里： https : //github.com/keras-team/keras/issues/9214

Answer 3

I had the similar finding in pytorch I would like to share. 我想在pytorch中分享类似的发现。 First of all, what is your keras version? 首先，您的keras版本是什么？ Because after 2.1.3, set BN layer trainable=False will make BN behave exactly the same in inference mode, meaning that it will not normalize the input to 0 mean 1 variance(like in training mode), but to running mean and variance. 因为在2.1.3之后，设置BN层trainable = False将使BN在推理模式下的行为完全相同，这意味着它不会将输入归一化为0均值1方差（类似于训练模式），而是归一化为均值和方差。 If you set learning phase to 1, then BN essentially becomes instance norm, which ignores running mean and variance, just normalize to 0 mean and 1 variance, which might be your desired behavior. 如果将学习阶段设置为1，则BN本质上将成为实例范数，它将忽略连续均值和方差，仅将其标准化为0均值和1方差，这可能是您想要的行为。

Reference link of keras release note: https://github.com/keras-team/keras/releases/tag/2.1.3 keras发行说明的参考链接： https : //github.com/keras-team/keras/releases/tag/2.1.3

API changes trainable attribute in BatchNormalization now disables the updates of the batch statistics (ie if trainable == False the layer will now run 100% in inference mode). 现在，API更改BatchNormalization中的可训练属性会禁用批次统计信息的更新（即，如果可训练== False，则该层现在将以推断模式运行100％）。

在keras中对批处理归一化进行微调

问题描述

3 个解决方案

解决方案1
0 2018-12-24 10:21:57

解决方案2
0 2019-01-10 00:53:34

解决方案3
0 2019-04-15 09:16:31

在keras中对批处理归一化进行微调

问题描述

3 个解决方案

解决方案1 0 2018-12-24 10:21:57

解决方案2 0 2019-01-10 00:53:34

解决方案3 0 2019-04-15 09:16:31

解决方案1
0 2018-12-24 10:21:57

解决方案2
0 2019-01-10 00:53:34

解决方案3
0 2019-04-15 09:16:31