resnet50 理论问题 - 输出形状和密集层单元？

Question

i am learning tensorflow/keras for image classification and i feel like i'm missing a critical part of the theory.我正在学习用于图像分类的 tensorflow/keras，我觉得我错过了理论的关键部分。

the task that i am currently working on deals with using a pretrained model (Resnet50 in this case) to do classification on a small data set, with limited training time.我目前正在处理的任务是使用预训练模型（在本例中为 Resnet50）对小数据集进行分类，训练时间有限。

the data set is 1600 150 x 150 color photos of fruit, that fall into 12 classes.数据集是 1600 张 150 x 150 的水果彩色照片，分为 12 个类别。 i am using a generator for the images:我正在使用图像生成器：

datagen = ImageDataGenerator(
        validation_split=0.25, 
        rescale=1/255,
        horizontal_flip=True,
        vertical_flip=True,
        width_shift_range=0.2,
        height_shift_range=0.2,
        rotation_range=90)
 
    train_datagen_flow = datagen.flow_from_directory(
        '/datasets/fruits_small/',
        target_size=(150, 150),
        batch_size=32,
        class_mode='sparse',
        subset='training',
        seed=12345)
 
    val_datagen_flow = datagen.flow_from_directory(
        '/datasets/fruits_small/',
        target_size=(150, 150),
        batch_size=32,
        class_mode='sparse',
        subset='validation',
        seed=12345)
 
    features, target = next(train_datagen_flow)

here is the layers i am using:这是我正在使用的图层：

backbone = ResNet50(input_shape=(150, 150, 3),weights='imagenet', include_top=False) backbone.trainable = False骨干 = ResNet50(input_shape=(150, 150, 3),weights='imagenet', include_top=False)backbone.trainable = False

model = Sequential()
    optimizer = Adam(lr=0.001)
    model.add(backbone)
    model.add(GlobalMaxPooling2D())
    model.add(Dense(2048,activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(512,activation = 'relu'))
    model.add(BatchNormalization())
    model.add(Dense(12, activation='softmax'))
    model.compile(optimizer = optimizer, loss='sparse_categorical_crossentropy',metrics=['acc'])

Now, this is my first attempt at using globalmax and resnet50, and i am experiencing MASSIVE overfitting, because, i presume, the small data set.现在，这是我第一次尝试使用 globalmax 和 resnet50，我正在经历大规模的过度拟合，因为我认为是小数据集。

i've done some reading on the subject, and, i've tried a few normalization efforts with limited success.我已经对这个主题进行了一些阅读，并且尝试了一些标准化的努力，但收效甚微。

in conversation with my tutor, he suggested that i think more critically about the output of the resnet model when selecting my parameters for my dense layers.在与我的导师的交谈中，他建议我在为我的密集层选择参数时更仔细地考虑 resnet 模型的输出。

this comment made me realize that i have basically been arbitrarily selecting the filters for the dense layers, but it sounds like i should understand something related to the output of the previous layer when building a new one, and i'm not sure what, but i feel like i am missing something critical.这条评论让我意识到我基本上是在为密集层随意选择过滤器，但听起来我应该在构建新层时了解与前一层输出相关的内容，我不确定是什么，但是我觉得我错过了一些关键的东西。

this is what my current layer summary looks like:这是我当前的图层摘要的样子：

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
resnet50 (Model)             (None, 5, 5, 2048)        23587712  
_________________________________________________________________
global_max_pooling2d_3 (Glob (None, 2048)              0         
_________________________________________________________________
dense_7 (Dense)              (None, 2048)              4196352   
_________________________________________________________________
batch_normalization_2 (Batch (None, 2048)              8192      
_________________________________________________________________
dense_8 (Dense)              (None, 512)               1049088   
_________________________________________________________________
batch_normalization_3 (Batch (None, 512)               2048      
_________________________________________________________________
dense_9 (Dense)              (None, 12)                6156      
=================================================================
Total params: 28,849,548
Trainable params: 5,256,716
Non-trainable params: 23,592,832

here is what my current output looks like:这是我当前的输出：

    Epoch 1/3
40/40 [==============================] - 363s 9s/step - loss: 0.5553 - acc: 0.8373 - val_loss: 3.8422 - val_acc: 0.1295
Epoch 2/3
40/40 [==============================] - 354s 9s/step - loss: 0.1621 - acc: 0.9423 - val_loss: 6.3961 - val_acc: 0.1295
Epoch 3/3
40/40 [==============================] - 357s 9s/step - loss: 0.1028 - acc: 0.9716 - val_loss: 4.8895 - val_acc: 0.1295

so i've read about freezing the resnet layers for training to help with overfitting, and regularization ( which is what i am attempting with the batch normalization? - though this seems to be considered questionable to a lot of people.. ) i've also tried using dropout for the first and second dense layers as well as by increasing the data set size with augmentation ( i've got rotations and such )所以我读过关于冻结 resnet 层以进行训练以帮助过度拟合和正则化（这是我正在尝试进行批量标准化的内容？-尽管这似乎对很多人来说是有问题的...... ）我已经还尝试对第一和第二个密集层使用 dropout 以及通过增强来增加数据集大小（我有旋转等）

Any input would be appreciated!任何输入将不胜感激！

Answer 1

So, i found that i had a misunderstanding about the shape of the output from the resnet/global average layer - it had a shape of 2048, and i was thinking that meant i needed my first dense layer to have 2048 filters, which was causing significant overfitting issues.所以，我发现我对 resnet/全局平均层的输出形状有误解 - 它的形状为 2048，我认为这意味着我需要我的第一个密集层有 2048 个过滤器，这导致严重的过拟合问题。

i ultimately changed my dense layers to have 256, then 64 and finally 12 (because i have 12 classes to categorize) and that significantly improved performance.我最终将我的密集层改为 256，然后是 64，最后是 12（因为我有 12 个类要分类），这显着提高了性能。

resnet50 理论问题 - 输出形状和密集层单元？

问题描述

1 个解决方案

解决方案1
1 2021-06-26 19:06:06

resnet50 理论问题 - 输出形状和密集层单元？

问题描述

1 个解决方案

解决方案1 1 2021-06-26 19:06:06

解决方案1
1 2021-06-26 19:06:06