resnet50 迁移学习期间的大规模过拟合

Question

This is my first attempt at doing something with CNNs, so I am probably doing something very stupid - but can't figure out where I am wrong...这是我第一次尝试用 CNN 做点什么，所以我可能在做一些非常愚蠢的事情 - 但无法弄清楚我错在哪里......

The model seems to be learning fine, but the validation accuracy is not improving (ever - even after the first epoch), and validation loss is actually increasing with time.该模型似乎学习良好，但验证准确性并未提高（甚至在第一个 epoch 之后），并且验证损失实际上随着时间的推移而增加。 It doesn't look like I am overfiting (after 1 epoch?) - must we off in some other way.看起来我并没有过度拟合（在 1 个 epoch 之后？） - 我们必须以其他方式结束。

typical network behaviour典型的网络行为

I am training a CNN network - I have ~100k images of various plants (1000 classes) and want to fine-tune ResNet50 to create a muticlass classifier.我正在训练一个 CNN 网络 - 我有大约 10 万张各种植物（1000 个类别）的图像，并且想要微调 ResNet50 以创建一个多类别分类器。 Images are of various sizes, I load them like so:图像有各种尺寸，我像这样加载它们：

from keras.preprocessing import image                  

def path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(IMG_HEIGHT, IMG_HEIGHT))
    # convert PIL.Image.Image type to 3D tensor with shape (IMG_HEIGHT, IMG_HEIGHT, 3)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, IMG_HEIGHT, IMG_HEIGHT, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)

def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in img_paths] #can use tqdm(img_paths) for data
    return np.vstack(list_of_tensors)enter code here

The database is large (does not fit into memory) and had to create my own generator to provide both reading from the disk and augmentation.数据库很大（不适合内存）并且必须创建我自己的生成器来提供从磁盘读取和扩充。 (I know Keras has .flow_from_directory() - but my data is not structured this way - it is just a dump of 100k images mixed with 100k metadata files). （我知道 Keras 有 .flow_from_directory() - 但我的数据不是这样构建的 - 它只是 100k 图像与 100k 元数据文件混合的转储）。 I probably should have created a script to structure them better and not create my own generators, but the problem is likely somewhere else.我可能应该创建一个脚本来更好地构建它们，而不是创建我自己的生成器，但问题可能出在其他地方。

The generator version below doesn't do any augmentation for the time being - just rescaling:下面的生成器版本暂时不做任何增强 - 只是重新缩放：

def generate_batches_from_train_folder(images_to_read, labels, batchsize = BATCH_SIZE):    

    #Generator that returns batches of images ('xs') and labels ('ys') from the train folder
    #:param string filepath: Full filepath of files to read - this needs to be a list of image files
    #:param np.array: list of all labels for the images_to_read - those need to be one-hot-encoded
    #:param int batchsize: Size of the batches that should be generated.
    #:return: (ndarray, ndarray) (xs, ys): Yields a tuple which contains a full batch of images and labels. 

    dimensions = (BATCH_SIZE, IMG_HEIGHT, IMG_HEIGHT, 3)

    train_datagen = ImageDataGenerator(
        rescale=1./255,
        #rotation_range=20,
        #zoom_range=0.2, 
        #fill_mode='nearest',
        #horizontal_flip=True
    )

    # needs to be on a infinite loop for the generator to work
    while 1:
        filesize = len(images_to_read)

        # count how many entries we have read
        n_entries = 0
        # as long as we haven't read all entries from the file: keep reading
        while n_entries < (filesize - batchsize):

            # start the next batch at index 0
            # create numpy arrays of input data (features) 
            # - this is already shaped as a tensor (output of the support function paths_to_tensor)
            xs = paths_to_tensor(images_to_read[n_entries : n_entries + batchsize])

            # and label info. Contains 1000 labels in my case for each possible plant species
            ys = labels[n_entries : n_entries + batchsize]

            # we have read one more batch from this file
            n_entries += batchsize

            #perform online augmentation on the xs and ys
            augmented_generator = train_datagen.flow(xs, ys, batch_size = batchsize)

        yield  next(augmented_generator)

This is how I define my model:这就是我定义模型的方式：

def get_model():

    # define the model
    base_net = ResNet50(input_shape=DIMENSIONS, weights='imagenet', include_top=False)

    # Freeze the layers which you don't want to train. Here I am freezing all of them
    for layer in base_net.layers:
        layer.trainable = False

    x = base_net.output

    #for resnet50
    x = Flatten()(x)
    x = Dense(512, activation="relu")(x)
    x = Dropout(0.5)(x)
    x = Dense(1000, activation='softmax', name='predictions')(x)

    model = Model(inputs=base_net.input, outputs=x)

    # compile the model 
    model.compile(
        loss='categorical_crossentropy',
        optimizer=optimizers.Adam(1e-3),
        metrics=['acc'])

    return model

So, as a result I have 1,562,088 trainable parameters for roughly 70k images因此，我有 1,562,088 个可训练参数用于大约 70k 张图像

I then use a 5-fold cross validation, but the model doesn't work on any of the folds, so I will not be including the full code here, the relevant bit is this:然后我使用了 5 折交叉验证，但该模型不适用于任何折叠，所以我不会在这里包含完整的代码，相关位是这样的：

trial_fold = temp_model.fit_generator(
                train_generator,
                steps_per_epoch = len(X_train_path) // BATCH_SIZE,
                epochs = 50,
                verbose = 1,
                validation_data = (xs_v,ys_v),#valid_generator,
                #validation_steps= len(X_valid_path) // BATCH_SIZE,
                callbacks = callbacks,
                shuffle=True)

I have done various things - made sure my generator is actually working, tried to play with the last few layers of the network by reducing the size of the fully connected layer, tried augmentation - nothing helps...我做了很多事情 - 确保我的生成器确实在工作，尝试通过减少完全连接层的大小来使用网络的最后几层，尝试增强 - 没有任何帮助......

I don't think the number of parameters in the network is too large - I know other people have done pretty much the same thing and got accuracy closer to 0.5, but my models seem to be overfitting like crazy.我不认为网络中的参数数量太大 - 我知道其他人也做了几乎相同的事情并且准确率接近 0.5，但我的模型似乎疯狂过度拟合。 Any ideas on how to tackle this will be much appreciated!关于如何解决这个问题的任何想法将不胜感激！

Update 1:更新 1：

I have decided to stop reinventing stuff and sorted by files to work with .flow_from_directory() procedure.我决定停止重新发明东西并按文件排序以使用 .flow_from_directory() 过程。 To make sure I am importing the right format (triggered by the Ioannis Nasios comment below) - I made sure to the preprocessing_unit() from keras's resnet50 application.为了确保我导入了正确的格式（由下面的 Ioannis Nasios 评论触发） - 我确保了来自 keras 的 resnet50 应用程序的 preprocessing_unit() 。

I also decided to check out if the model is actually producing something useful - I computed botleneck features for my dataset and then used a random forest to predict the classes.我还决定检查模型是否真的产生了一些有用的东西——我为我的数据集计算了瓶颈特征，然后使用随机森林来预测类别。 It did work and I got accuracy of around 0.4它确实有效，我的准确度约为 0.4

So, I guess I definitely had a problem with an input format of my images.所以，我想我的图像输入格式肯定有问题。 As a next step, I will fine-tune the model (with a new top layer) to see if the problem remains...作为下一步，我将微调模型（使用新的顶层）以查看问题是否仍然存在......

Update 2:更新 2：

I think the problem was with image preprocessing.我认为问题出在图像预处理上。 I ended up not fine tuning in the end and just extracted botleneck layer and training linear_SVC() - got accuracy of around 60% of train and around 45% of test datasets.我最终没有进行微调，只是提取了瓶颈层并训练了 linear_SVC() - 获得了大约 60% 的训练和大约 45% 的测试数据集的准确度。

Answer 1

You need to use the preprocessing_function argument in ImageDataGenerator.您需要在 ImageDataGenerator 中使用 preprocessing_function 参数。

 train_datagen = ImageDataGenerator(preprocessing_function=keras.applications.resnet50.preprocess_input)

This will ensure that your images are pre-processed as expected for the pre-trained network you are using.这将确保您的图像按照您正在使用的预训练网络的预期进行预处理。

Answer 2

Have you got any work around of your problem?你有没有解决你的问题？ If not then this might be an issue with batch norm layer in your resnet.如果不是，那么这可能是您的 resnet 中批处理规范层的问题。 I have also faced similar kind of issue as in keras batch norm layer behave very differently during training and testing.我也遇到过类似的问题，因为在训练和测试期间，keras 批处理规范层的行为非常不同。 So you can freeze all bn layers by:因此，您可以通过以下方式冻结所有 bn 层：

BatchNorm()(training=False)

and then try to retrain your network again on the same data set.然后尝试在同一数据集上再次重新训练您的网络。 one more thing you should keep in mind that during training you should set training flag as还有一件事你应该记住，在训练期间你应该将训练标志设置为

import keras.backend as K K.set_learning_phase(1)

and during testing set this flag to 0. I think it should work after making above changes.并在测试期间将此标志设置为 0。我认为在进行上述更改后它应该可以工作。

If you have found any other solution of the problem please post it here so that others can get benefit of that.如果您找到了该问题的任何其他解决方案，请在此处发布，以便其他人可以从中受益。

Thank you.谢谢你。

Answer 3

I implemented various architectures for transfer learning and observed that models containing BatchNorm layers (eg Inception, ResNet, MobileNet) perform a lot worse (~30 % compared to >95 % test accuracy) during evaluation (validation/test) than models without BatchNorm layers (eg VGG) on my custom dataset.我为迁移学习实现了各种架构，并观察到包含 BatchNorm 层的模型（例如 Inception、ResNet、MobileNet）在评估（验证/测试）期间的性能比没有 BatchNorm 层的模型差很多（~30% 与>95% 的测试准确率相比）（例如 VGG）在我的自定义数据集上。 Furthermore, this problem does not occurr when saving bottleneck features and using them for classification.此外，保存瓶颈特征并将其用于分类时不会出现此问题。 There are already a few blog entries, forum threads, issues and pull requests on this topic and it turns out that the BatchNorm layer uses not the new dataset's statistics but the original dataset's (ImageNet) statistics when frozen:已经有一些关于这个主题的博客条目、论坛主题、问题和拉取请求，结果证明 BatchNorm 层在冻结时使用的不是新数据集的统计数据，而是原始数据集 (ImageNet) 的统计数据：

Assume you are building a Computer Vision model but you don't have enough data, so you decide to use one of the pre-trained CNNs of Keras and fine-tune it.假设您正在构建一个计算机视觉模型，但您没有足够的数据，因此您决定使用 Keras 的预训练 CNN 之一并对其进行微调。 Unfortunately, by doing so you get no guarantees that the mean and variance of your new dataset inside the BN layers will be similar to the ones of the original dataset.不幸的是，这样做无法保证 BN 层内新数据集的均值和方差与原始数据集的均值和方差相似。 Remember that at the moment, during training your network will always use the mini-batch statistics either the BN layer is frozen or not;请记住，目前，在训练过程中，无论 BN 层是否冻结，您的网络都将始终使用小批量统计数据； also during inference you will use the previously learned statistics of the frozen BN layers.同样在推理过程中，您将使用先前学习的冻结 BN 层的统计数据。 As a result, if you fine-tune the top layers, their weights will be adjusted to the mean/variance of the new dataset.因此，如果您对顶层进行微调，它们的权重将根据新数据集的均值/方差进行调整。 Nevertheless, during inference they will receive data which are scaled differently because the mean/variance of the original dataset will be used.然而，在推理过程中，他们将收到缩放不同的数据，因为将使用原始数据集的均值/方差。

cited from http://blog.datumbox.com/the-batch-normalization-layer-of-keras-is-broken/引自http://blog.datumbox.com/the-batch-normalization-layer-of-keras-is-broken/

What fixed the problem for me, was to freeze all layers and then unfreeze all BatchNormalization layers to make them use the new dataset's statistics instead of the original statistics:对我来说解决问题的是冻结所有层，然后解冻所有 BatchNormalization 层，使它们使用新数据集的统计数据而不是原始统计数据：

# build model
input_tensor = Input(shape=train_generator.image_shape)
base_model = inception_v3.InceptionV3(input_tensor=input_tensor,
                                      include_top=False,
                                      weights='imagenet',
                                      pooling='avg')
x = base_model.output

# freeze all layers in the base model
base_model.trainable = False

# un-freeze the BatchNorm layers
for layer in base_model.layers:
    if "BatchNormalization" in layer.__class__.__name__:
        layer.trainable = True

# add custom layers
x = Dense(1024, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(train_generator.num_classes, activation='softmax')(x)

# define new model
model = Model(inputs=input_tensor, outputs=x)

This also explains the difference in performance between training the model with frozen layers and evaluate it with a validation/test set and saving bottleneck features (with model.predict the internal backend flag set_learning_phase is set to 0 ) and training a classifier on the cached bottleneck features.这也解释了使用冻结层训练模型并使用验证/测试集对其进行评估并保存瓶颈特征（使用 model.predict 内部后端标志set_learning_phase设置为0 ）与在缓存瓶颈上训练分类器之间的性能差异特征。

More information here:更多信息在这里：

Pull request to change this behavior (not-accepted): https://github.com/keras-team/keras/pull/9965更改此行为的拉取请求（未接受）： https : //github.com/keras-team/keras/pull/9965

Similar thread: https://datascience.stackexchange.com/questions/47966/over-fitting-in-transfer-learning-with-small-dataset/72436#72436类似主题： https : //datascience.stackexchange.com/questions/47966/over-fit-in-transfer-learning-with-small-dataset/72436#72436

Answer 4

I am also working on a very small dataset and encountered the same problem of validation accuracy being stuck at some point although the training accuracy keeps going higher.我也在处理一个非常小的数据集，并遇到了同样的问题，尽管训练准确度一直在提高，但验证准确度在某些时候卡住了。 I also noticed that my validation loss was getting higher as well over time.我还注意到，随着时间的推移，我的验证损失也越来越高。 FYI, I am using Resnet 50 and InceptionV3 models.仅供参考，我使用的是 Resnet 50 和 InceptionV3 模型。

After some digging on the internet, I found a discussion on github taking place which connects this problem to the implementation of Batch Normalization layers in Keras.在互联网上进行了一些挖掘之后，我发现了一个关于 github 的讨论，该讨论将这个问题与 Keras 中批量标准化层的实现联系起来。 The above mentioned problem is encountered when applying transfer learning and fine-tuning the network.在应用迁移学习和微调网络时会遇到上述问题。 I am not sure if you have the same problem, but I have added the link below to Github where you can read more about this problem, and try to apply some tests which will help you in understanding if you are affected by the same problem.我不确定您是否有同样的问题，但我已将下面的链接添加到 Github，您可以在其中阅读有关此问题的更多信息，并尝试应用一些测试来帮助您了解是否受到相同问题的影响。

Github link to the pull request and discussion指向拉取请求和讨论的 Github 链接

Answer 5

The problem is too small dataset for each class.问题是每个类的数据集太小。 100k examples / 1000 classes = ~100 examples per one class. 10 万个示例 / 1000 个类 = 每类约 100 个示例。 It's too small amount for that.它的数量太少了。 Your network can remember all your examples in weight matrices, but for generalization you should have a lot more examples.您的网络可以记住权重矩阵中的所有示例，但为了泛化，您应该有更多示例。 Try use only the most common classes and figure out what's happened.尝试仅使用最常见的类并找出发生了什么。

Answer 6

Here some explanation regarding fine tuning and transfer learning according to Stanford university这里有一些关于根据斯坦福大学的微调和迁移学习的解释

Very different dataset and very little dataset from image-net dataset - Try linear classifier from different stages来自图像网络数据集的非常不同的数据集和很少的数据集 - 尝试不同阶段的线性分类器

So to summarize所以总结一下

Since the dataset is very small, You may want to extract the features from the earlier layer and train a classifier on top of that and check if the problem still exists.由于数据集非常小，您可能希望从较早的层中提取特征并在其上训练分类器并检查问题是否仍然存在。

resnet50 迁移学习期间的大规模过拟合

问题描述

6 个解决方案

解决方案1
6 2018-07-04 19:15:49

解决方案2
4 2018-10-23 05:56:52

解决方案3
4 2020-04-16 14:11:36

解决方案4
1 2018-05-31 11:16:18

解决方案5
0 2018-05-16 14:02:19

解决方案6
0 2018-05-19 00:09:58

resnet50 迁移学习期间的大规模过拟合

问题描述

6 个解决方案

解决方案1 6 2018-07-04 19:15:49

解决方案2 4 2018-10-23 05:56:52

解决方案3 4 2020-04-16 14:11:36

解决方案4 1 2018-05-31 11:16:18

解决方案5 0 2018-05-16 14:02:19

解决方案6 0 2018-05-19 00:09:58

解决方案1
6 2018-07-04 19:15:49

解决方案2
4 2018-10-23 05:56:52

解决方案3
4 2020-04-16 14:11:36

解决方案4
1 2018-05-31 11:16:18

解决方案5
0 2018-05-16 14:02:19

解决方案6
0 2018-05-19 00:09:58