简体   繁体   English

使用BatchNorm,CNN训练准确性停滞不前,快速配备

[英]CNN training accuracy stagnates with BatchNorm, quickly overfits without

I have 2 types of grayscale images, let's say a car and a plane. 我有两种类型的灰度图像,比方说汽车和飞机。 In my training set, I have 1000 images (about a 50/50 split). 在我的训练集中,我有1000张图像(大约50/50分割)。 In this training set, all of my plane examples are on a white background, whereas all of car examples is on a black background (this is done on purpose and the model ultimately learn to differentiate between a car and a plane, not their backgrounds). 在这个训练集中,我的所有飞机示例都在白色背景上,而所有的汽车示例都在黑色背景上(这是有目的的,模型最终学会区分汽车和飞机,而不是他们的背景) 。

As a simple proof that a model will quickly overfit to the backgrounds, I created a CNN. 作为一个简单的证据,模型将很快适应背景,我创建了一个CNN。 However, I'm running into 2 weird scenarios: 但是,我遇到了两个奇怪的场景:

  1. If I add BatchNorm anywhere between a conv layer and another layer, my training accuracy seems to hover around 50% and can't improve. 如果我在一个转换层和另一个层之间的任何地方添加BatchNorm,我的训练准确率似乎徘徊在50%左右,无法改善。

  2. If I remove BatchNorm, my training accuracy quickly skyrockets to 98%ish. 如果我删除BatchNorm,我的训练准确度会迅速上升到98%ish。 Despite me using my training dataset to create a validation dataset (thus, this validation dataset also has the black/white background issue), my validation dataset hovers around 50%. 尽管我使用我的训练数据集来创建验证数据集(因此,此验证数据集也有黑/白背景问题),但我的验证数据集徘徊在50%左右。 I would expect my training dataset overfit to be caused by the black and white backgrounds, which my validation dataset also has and would be able to predict against. 我希望我的训练数据集过度使用是由黑白背景引起的,我的验证数据集也有,并且能够预测。

I've attached my code. 我附上了我的代码。 I get the data as a 1x4096 vector, so I reshape it into a 64x64 image. 我将数据作为1x4096向量,因此我将其重塑为64x64图像。 When I uncomment any of the BatchNorm steps in my code below, the training accuracy seems to hover 当我在下面的代码中取消注释任何BatchNorm步骤时,训练准确性似乎悬停

#Normalize training data
        self.x = self.x.astype('float32')
        self.x /= 255

        numSamples = self.x.shape[0]
        #Reconstruct images
        width = 64
        height = 64
        xInput = self.x.reshape(numSamples,1,height,width)

        y_test = to_categorical(labels, 2)

        #Split data to get validation set
        X_train, X_test, y_train, y_test = train_test_split(xInput, y_test, test_size=0.3, random_state=0)

        #Construct model
        self.model = Sequential()
        self.model.add(Conv2D(64, kernel_size=(3, 3), strides=(1, 1),
                 activation='relu',
                 input_shape=(1,64,64), data_format='channels_first',activity_regularizer=regularizers.l1(0.01)))
        #self.model.add(BatchNormalization())
        self.model.add(MaxPooling2D((2, 2)))
        self.model.add(Dropout(0.5, noise_shape=None)) 
        self.model.add(Conv2D(128, kernel_size=(3, 3), strides=(1, 1), activation='relu'))
        #self.model.add(BatchNormalization())
        self.model.add(MaxPooling2D((2, 2)))
        self.model.add(Dropout(0.5, noise_shape=None)) 
        self.model.add(Conv2D(256, kernel_size=(3, 3), strides=(1, 1), activation='relu'))
        #self.model.add(BatchNormalization())
        self.model.add(MaxPooling2D((2, 2)))
        self.model.add(Dropout(0.5, noise_shape=None)) 
        self.model.add(Flatten())
        self.model.add(Dense(1000, activation='relu', activity_regularizer=regularizers.l2(0.01)))
        self.model.add(BatchNormalization())
        self.model.add(Dropout(0.5, noise_shape=None)) 
        self.model.add(Dense(units = 2, activation = 'softmax', kernel_initializer='lecun_normal'))

        self.model.compile(loss='categorical_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])

        self.model.fit(X_train, y_train,
            batch_size=32,
            epochs=25,
            verbose=2,
            validation_data = (X_test,y_test),
            callbacks = [EarlyStopping(monitor = 'val_acc', patience =5)])

I think there are a number of potential improvements to the architecture of your ANN and some fundamental problem. 我认为人工神经网络的架构和一些基本问题有很多潜在的改进。

Fundamental challenge is with the way your training set has been built: black & white background. 基本挑战在于您的训练集的构建方式:黑白背景。 If the intention was that the background should not play a role, why not making all of them white or black? 如果意图是背景不应该发挥作用,为什么不将它们全部变成白色或黑色? Mind that ANN, like close to any machine learning algorithm, will attempt to find what differentiates your classes. 请注意,ANN与任何机器学习算法一样,都会尝试找出与您的类有什么区别。 And in this case it will be simply background. 在这种情况下,它将只是背景。 Why look at tiny details of car vs. air plane, when background provides so clear and rewarding differentiation? 当背景提供如此清晰和有益的差异化时,为什么要看一下汽车与飞机的微小细节?

Solution : Make background uniform for both sets. 解决方案 :为两组设置背景均匀。 Then your ANN will be oblivious to it. 然后你的人工神经网络将会忘记它。

Why Batch Norm was messing up training accuracy? 为什么Batch Norm会破坏培训的准确性? As you noted yourself, test accuracy was still poor. 正如您自己所说,测试准确性仍然很差。 Batch Norm was fixing covariance shift problem. Batch Norm正在解决协方差转移问题。 The "problem" was manifesting later in seemingly great training accuracy - and poor test. “问题”后来表现​​出看似很好的训练准确性 - 以及糟糕的测试。 Great video on Batch Normalisation, with piece on covaraince shift , from Andrew Ng here . 大视频批标准化,对covaraince变速件,安德鲁伍这里

Fixing training should fix the issue. 修复培训应解决问题。 Some other things: 其他一些事情:

  • At the very end you give 2 dense units, but your classification is binary. 在最后你给出2个密集单位,但你的分类是二进制的。 Make it a single unit with sigmoid activation. 使其成为Sigmoid激活的单个单元。
  • As pointed out by @Upasana Mittal, replace categorical_crossentropy with binary_crossentropy . 正如@Upasana Mittal所指出的,将categorical_crossentropy替换为binary_crossentropy
  • Consider using smaller dropout rates. 考虑使用较小的辍学率。 Mind you don't have that much data to always discard half of it. 请注意,您没有那么多数据总是丢弃一半。 Increase dropout only after you have evidence of overfitting. 只有在您有过度拟合的证据后才会增加辍学率。
  • Using Conv2D with strides can be better than simple max pooling. 使用带有步幅的Conv2D可能比简单的最大池更好。
  • You have a lot of filters for what does not seem to be that super complicated. 你有很多过滤器,似乎不是那么复杂。 Consider severe reduction in number of filters and increase the number only when you see that the ANN has not enough capacity for learning. 考虑过滤器数量的严重减少,并且只有当您发现ANN没有足够的学习能力时才会增加数量。 You have only 2 classes here and the features differentiating car from a jet are not that subtle. 这里只有2个班级,汽车与喷气机的区别并不那么微妙。
  • Consider using smaller number of layers. 考虑使用较少数量的图层。 Same argument. 同样的论点。
  • Using at least 2 stacked 3x3 Conv2D layers can yield better results. 使用至少2个堆叠的3x3 Conv2D层可以产生更好的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM