[英]CNN training accuracy stagnates with BatchNorm, quickly overfits without
I have 2 types of grayscale images, let's say a car and a plane. 我有两种类型的灰度图像,比方说汽车和飞机。 In my training set, I have 1000 images (about a 50/50 split).
在我的训练集中,我有1000张图像(大约50/50分割)。 In this training set, all of my plane examples are on a white background, whereas all of car examples is on a black background (this is done on purpose and the model ultimately learn to differentiate between a car and a plane, not their backgrounds).
在这个训练集中,我的所有飞机示例都在白色背景上,而所有的汽车示例都在黑色背景上(这是有目的的,模型最终学会区分汽车和飞机,而不是他们的背景) 。
As a simple proof that a model will quickly overfit to the backgrounds, I created a CNN. 作为一个简单的证据,模型将很快适应背景,我创建了一个CNN。 However, I'm running into 2 weird scenarios:
但是,我遇到了两个奇怪的场景:
If I add BatchNorm anywhere between a conv layer and another layer, my training accuracy seems to hover around 50% and can't improve. 如果我在一个转换层和另一个层之间的任何地方添加BatchNorm,我的训练准确率似乎徘徊在50%左右,无法改善。
If I remove BatchNorm, my training accuracy quickly skyrockets to 98%ish. 如果我删除BatchNorm,我的训练准确度会迅速上升到98%ish。 Despite me using my training dataset to create a validation dataset (thus, this validation dataset also has the black/white background issue), my validation dataset hovers around 50%.
尽管我使用我的训练数据集来创建验证数据集(因此,此验证数据集也有黑/白背景问题),但我的验证数据集徘徊在50%左右。 I would expect my training dataset overfit to be caused by the black and white backgrounds, which my validation dataset also has and would be able to predict against.
我希望我的训练数据集过度使用是由黑白背景引起的,我的验证数据集也有,并且能够预测。
I've attached my code. 我附上了我的代码。 I get the data as a 1x4096 vector, so I reshape it into a 64x64 image.
我将数据作为1x4096向量,因此我将其重塑为64x64图像。 When I uncomment any of the BatchNorm steps in my code below, the training accuracy seems to hover
当我在下面的代码中取消注释任何BatchNorm步骤时,训练准确性似乎悬停
#Normalize training data
self.x = self.x.astype('float32')
self.x /= 255
numSamples = self.x.shape[0]
#Reconstruct images
width = 64
height = 64
xInput = self.x.reshape(numSamples,1,height,width)
y_test = to_categorical(labels, 2)
#Split data to get validation set
X_train, X_test, y_train, y_test = train_test_split(xInput, y_test, test_size=0.3, random_state=0)
#Construct model
self.model = Sequential()
self.model.add(Conv2D(64, kernel_size=(3, 3), strides=(1, 1),
activation='relu',
input_shape=(1,64,64), data_format='channels_first',activity_regularizer=regularizers.l1(0.01)))
#self.model.add(BatchNormalization())
self.model.add(MaxPooling2D((2, 2)))
self.model.add(Dropout(0.5, noise_shape=None))
self.model.add(Conv2D(128, kernel_size=(3, 3), strides=(1, 1), activation='relu'))
#self.model.add(BatchNormalization())
self.model.add(MaxPooling2D((2, 2)))
self.model.add(Dropout(0.5, noise_shape=None))
self.model.add(Conv2D(256, kernel_size=(3, 3), strides=(1, 1), activation='relu'))
#self.model.add(BatchNormalization())
self.model.add(MaxPooling2D((2, 2)))
self.model.add(Dropout(0.5, noise_shape=None))
self.model.add(Flatten())
self.model.add(Dense(1000, activation='relu', activity_regularizer=regularizers.l2(0.01)))
self.model.add(BatchNormalization())
self.model.add(Dropout(0.5, noise_shape=None))
self.model.add(Dense(units = 2, activation = 'softmax', kernel_initializer='lecun_normal'))
self.model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
self.model.fit(X_train, y_train,
batch_size=32,
epochs=25,
verbose=2,
validation_data = (X_test,y_test),
callbacks = [EarlyStopping(monitor = 'val_acc', patience =5)])
I think there are a number of potential improvements to the architecture of your ANN and some fundamental problem. 我认为人工神经网络的架构和一些基本问题有很多潜在的改进。
Fundamental challenge is with the way your training set has been built: black & white background. 基本挑战在于您的训练集的构建方式:黑白背景。 If the intention was that the background should not play a role, why not making all of them white or black?
如果意图是背景不应该发挥作用,为什么不将它们全部变成白色或黑色? Mind that ANN, like close to any machine learning algorithm, will attempt to find what differentiates your classes.
请注意,ANN与任何机器学习算法一样,都会尝试找出与您的类有什么区别。 And in this case it will be simply background.
在这种情况下,它将只是背景。 Why look at tiny details of car vs. air plane, when background provides so clear and rewarding differentiation?
当背景提供如此清晰和有益的差异化时,为什么要看一下汽车与飞机的微小细节?
Solution : Make background uniform for both sets. 解决方案 :为两组设置背景均匀。 Then your ANN will be oblivious to it.
然后你的人工神经网络将会忘记它。
Why Batch Norm was messing up training accuracy? 为什么Batch Norm会破坏培训的准确性? As you noted yourself, test accuracy was still poor.
正如您自己所说,测试准确性仍然很差。 Batch Norm was fixing covariance shift problem.
Batch Norm正在解决协方差转移问题。 The "problem" was manifesting later in seemingly great training accuracy - and poor test.
“问题”后来表现出看似很好的训练准确性 - 以及糟糕的测试。 Great video on Batch Normalisation, with piece on covaraince shift , from Andrew Ng here .
大视频批标准化,对covaraince变速件,安德鲁伍这里 。
Fixing training should fix the issue. 修复培训应解决问题。 Some other things:
其他一些事情:
categorical_crossentropy
with binary_crossentropy
. categorical_crossentropy
替换为binary_crossentropy
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.