简体   繁体   English

向 Conv2D 层添加 padding='same' 参数破坏了模型

[英]Adding padding='same' argument to Conv2D layers broke the model

I created this model我创建了这个模型

from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D, Input, Dense
from tensorflow.keras.layers import Reshape, Flatten
from tensorflow.keras import Model


def create_DeepCAPCHA(input_shape=(28,28,1),n_prediction=1,n_class=10,optimizer='adam',
                      show_summary=True):
    inputs = Input(input_shape)
    x = Conv2D(filters=32, kernel_size=3, activation='relu', padding='same')(inputs)
    x = MaxPooling2D(pool_size=2)(x)
    x = Conv2D(filters=48, kernel_size=3, activation='relu', padding='same')(x)
    x = MaxPooling2D(pool_size=2)(x)
    x = Conv2D(filters=64, kernel_size=3, activation='relu', padding='same')(x)
    x = MaxPooling2D(pool_size=2)(x)
    x = Flatten()(x)
    x = Dense(512, activation='relu')(x)
    x = Dense(units=n_prediction*n_class, activation='softmax')(x)
    outputs = Reshape((n_prediction,n_class))(x)
    model = Model(inputs, outputs)
    model.compile(optimizer=optimizer,
                loss='categorical_crossentropy',
                metrics= ['accuracy'])
    if show_summary:
        model.summary()
    return model

I tried the model on MNIST dataset我在 MNIST 数据集上尝试了模型

import tensorflow as tf
import numpy as np

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

inputs = x_train
outputs = tf.keras.utils.to_categorical(y_train, num_classes=10)
outputs = np.expand_dims(outputs,1)

model = create_DeepCAPCHA(input_shape=(28,28,1),n_prediction=1,n_class=10)
model.fit(inputs, outputs, epochs=10, validation_split=0.1)

but it failed to converge (stuck at 10% accuracy => same as random guessing).但它未能收敛(停留在 10% 的准确率 => 与随机猜测相同)。 Yet when I remove the "padding='same'" argument from Conv2D layers, it works flawlessly:然而,当我从 Conv2D 层中删除 "padding='same'" 参数时,它可以完美地工作:

def working_DeepCAPCHA(input_shape=(28,28,1),n_prediction=1,n_class=10,optimizer='adam',
                      show_summary=True):
    inputs = Input(input_shape)
    x = Conv2D(filters=32, kernel_size=3, activation='relu')(inputs)
    x = MaxPooling2D(pool_size=2)(x)
    x = Conv2D(filters=48, kernel_size=3, activation='relu')(x)
    x = MaxPooling2D(pool_size=2)(x)
    x = Conv2D(filters=64, kernel_size=3, activation='relu')(x)
    x = MaxPooling2D(pool_size=2)(x)
    x = Flatten()(x)
    x = Dense(512, activation='relu')(x)
    x = Dense(units=n_prediction*n_class, activation='softmax')(x)
    outputs = Reshape((n_prediction,n_class))(x)
    model = Model(inputs, outputs)
    model.compile(optimizer=optimizer,
                loss='categorical_crossentropy',
                metrics= ['accuracy'])
    if show_summary:
        model.summary()
    return model

Anyone has any idea what problem this is?任何人都知道这是什么问题?

Thank you for sharing, it was really interesting to me.谢谢你的分享,这对我来说真的很有趣。 So I wrote the code and tested several scenarios.所以我写了代码并测试了几个场景。 Note that what I'm going to say is just my guest and I'm not sure about it.请注意,我要说的只是我的客人,我不确定。

My conclusion from those tests is that no padding or valid padding works because it produces (1, 1, 64) output shape for the last conv layer.我从这些测试中得出的结论是,没有填充或valid填充起作用,因为它为最后一个 conv 层生成(1, 1, 64)输出形状。 But if you set the padding to same it will produce (3, 3, 64) and because the next layer is a big Dense layer, it will multiply the number of networks parameters by 9 (I expected to somehow result in overfitting) and it seems to make it much harder for the network to find the good values of the parameters.但是,如果您将填充设置为same ,它将产生(3, 3, 64)并且因为下一层是一个大的 Dense 层,它将网络参数的数量乘以 9(我预计会以​​某种方式导致过度拟合)并且它似乎让网络更难找到合适的参数值。 So I tried some different ways to reduce the output of last conv layer to (1, 1, 64) as below:所以我尝试了一些不同的方法来将最后一个 conv 层的输出减少到(1, 1, 64)如下:

  • using one more conv layer + maxpooling再使用一层卷积层 + 最大池化
  • change the last maxpooling to pool_size of 4将最后一个 maxpooling 更改为 pool_size 为 4
  • using stride of 2 for one of conv layers对卷积层之一使用步幅为 2
  • change the filters of last conv layer to 20将最后一个 conv 层的过滤器更改为 20

and they all worked well.他们都运作良好。 even changing the dense units from 512 to 64 will help as well (note that even now you may get poor results with a little chance, because of bad initialization I guess).即使将密集单位从 512 更改为 64 也会有所帮助(请注意,即使是现在,您也可能会因为初始化不当而获得较差的结果,因为我猜)。

Then I changed the shape of the last conv layer to (2, 2, 64) and the chance to get a good result (more than 90% accuracy) reduced (alot of time I've got 10% accuracy).然后我将最后一个 conv 层的形状更改为(2, 2, 64)并且获得好的结果(超过 90% 的准确率)的机会减少了(很多时候我的准确率是 10%)。

So it seems that a lot of parameters can confuse the model.所以看起来很多参数都会混淆模型。 But if you want to know why the network does not overfit, I have no answer for you.但是如果你想知道为什么网络不会过拟合,我没有答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM