Massive overfit during resnet50 transfer learning

Question

This is my first attempt at doing something with CNNs, so I am probably doing something very stupid - but can't figure out where I am wrong...

The model seems to be learning fine, but the validation accuracy is not improving (ever - even after the first epoch), and validation loss is actually increasing with time. It doesn't look like I am overfiting (after 1 epoch?) - must we off in some other way.

typical network behaviour

I am training a CNN network - I have ~100k images of various plants (1000 classes) and want to fine-tune ResNet50 to create a muticlass classifier. Images are of various sizes, I load them like so:

from keras.preprocessing import image                  

def path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(IMG_HEIGHT, IMG_HEIGHT))
    # convert PIL.Image.Image type to 3D tensor with shape (IMG_HEIGHT, IMG_HEIGHT, 3)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, IMG_HEIGHT, IMG_HEIGHT, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)

def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in img_paths] #can use tqdm(img_paths) for data
    return np.vstack(list_of_tensors)enter code here

The database is large (does not fit into memory) and had to create my own generator to provide both reading from the disk and augmentation. (I know Keras has .flow_from_directory() - but my data is not structured this way - it is just a dump of 100k images mixed with 100k metadata files). I probably should have created a script to structure them better and not create my own generators, but the problem is likely somewhere else.

The generator version below doesn't do any augmentation for the time being - just rescaling:

def generate_batches_from_train_folder(images_to_read, labels, batchsize = BATCH_SIZE):    

    #Generator that returns batches of images ('xs') and labels ('ys') from the train folder
    #:param string filepath: Full filepath of files to read - this needs to be a list of image files
    #:param np.array: list of all labels for the images_to_read - those need to be one-hot-encoded
    #:param int batchsize: Size of the batches that should be generated.
    #:return: (ndarray, ndarray) (xs, ys): Yields a tuple which contains a full batch of images and labels. 

    dimensions = (BATCH_SIZE, IMG_HEIGHT, IMG_HEIGHT, 3)

    train_datagen = ImageDataGenerator(
        rescale=1./255,
        #rotation_range=20,
        #zoom_range=0.2, 
        #fill_mode='nearest',
        #horizontal_flip=True
    )

    # needs to be on a infinite loop for the generator to work
    while 1:
        filesize = len(images_to_read)

        # count how many entries we have read
        n_entries = 0
        # as long as we haven't read all entries from the file: keep reading
        while n_entries < (filesize - batchsize):

            # start the next batch at index 0
            # create numpy arrays of input data (features) 
            # - this is already shaped as a tensor (output of the support function paths_to_tensor)
            xs = paths_to_tensor(images_to_read[n_entries : n_entries + batchsize])

            # and label info. Contains 1000 labels in my case for each possible plant species
            ys = labels[n_entries : n_entries + batchsize]

            # we have read one more batch from this file
            n_entries += batchsize

            #perform online augmentation on the xs and ys
            augmented_generator = train_datagen.flow(xs, ys, batch_size = batchsize)

        yield  next(augmented_generator)

This is how I define my model:

def get_model():

    # define the model
    base_net = ResNet50(input_shape=DIMENSIONS, weights='imagenet', include_top=False)

    # Freeze the layers which you don't want to train. Here I am freezing all of them
    for layer in base_net.layers:
        layer.trainable = False

    x = base_net.output

    #for resnet50
    x = Flatten()(x)
    x = Dense(512, activation="relu")(x)
    x = Dropout(0.5)(x)
    x = Dense(1000, activation='softmax', name='predictions')(x)

    model = Model(inputs=base_net.input, outputs=x)

    # compile the model 
    model.compile(
        loss='categorical_crossentropy',
        optimizer=optimizers.Adam(1e-3),
        metrics=['acc'])

    return model

So, as a result I have 1,562,088 trainable parameters for roughly 70k images

I then use a 5-fold cross validation, but the model doesn't work on any of the folds, so I will not be including the full code here, the relevant bit is this:

trial_fold = temp_model.fit_generator(
                train_generator,
                steps_per_epoch = len(X_train_path) // BATCH_SIZE,
                epochs = 50,
                verbose = 1,
                validation_data = (xs_v,ys_v),#valid_generator,
                #validation_steps= len(X_valid_path) // BATCH_SIZE,
                callbacks = callbacks,
                shuffle=True)

I have done various things - made sure my generator is actually working, tried to play with the last few layers of the network by reducing the size of the fully connected layer, tried augmentation - nothing helps...

I don't think the number of parameters in the network is too large - I know other people have done pretty much the same thing and got accuracy closer to 0.5, but my models seem to be overfitting like crazy. Any ideas on how to tackle this will be much appreciated!

Update 1:

I have decided to stop reinventing stuff and sorted by files to work with .flow_from_directory() procedure. To make sure I am importing the right format (triggered by the Ioannis Nasios comment below) - I made sure to the preprocessing_unit() from keras's resnet50 application.

I also decided to check out if the model is actually producing something useful - I computed botleneck features for my dataset and then used a random forest to predict the classes. It did work and I got accuracy of around 0.4

So, I guess I definitely had a problem with an input format of my images. As a next step, I will fine-tune the model (with a new top layer) to see if the problem remains...

Update 2:

I think the problem was with image preprocessing. I ended up not fine tuning in the end and just extracted botleneck layer and training linear_SVC() - got accuracy of around 60% of train and around 45% of test datasets.

Answer 1

You need to use the preprocessing_function argument in ImageDataGenerator.

 train_datagen = ImageDataGenerator(preprocessing_function=keras.applications.resnet50.preprocess_input)

This will ensure that your images are pre-processed as expected for the pre-trained network you are using.

Answer 2

Have you got any work around of your problem? If not then this might be an issue with batch norm layer in your resnet. I have also faced similar kind of issue as in keras batch norm layer behave very differently during training and testing. So you can freeze all bn layers by:

BatchNorm()(training=False)

and then try to retrain your network again on the same data set. one more thing you should keep in mind that during training you should set training flag as

import keras.backend as K K.set_learning_phase(1)

and during testing set this flag to 0. I think it should work after making above changes.

If you have found any other solution of the problem please post it here so that others can get benefit of that.

Thank you.

Answer 3

I implemented various architectures for transfer learning and observed that models containing BatchNorm layers (eg Inception, ResNet, MobileNet) perform a lot worse (~30 % compared to >95 % test accuracy) during evaluation (validation/test) than models without BatchNorm layers (eg VGG) on my custom dataset. Furthermore, this problem does not occurr when saving bottleneck features and using them for classification. There are already a few blog entries, forum threads, issues and pull requests on this topic and it turns out that the BatchNorm layer uses not the new dataset's statistics but the original dataset's (ImageNet) statistics when frozen:

Assume you are building a Computer Vision model but you don't have enough data, so you decide to use one of the pre-trained CNNs of Keras and fine-tune it. Unfortunately, by doing so you get no guarantees that the mean and variance of your new dataset inside the BN layers will be similar to the ones of the original dataset. Remember that at the moment, during training your network will always use the mini-batch statistics either the BN layer is frozen or not; also during inference you will use the previously learned statistics of the frozen BN layers. As a result, if you fine-tune the top layers, their weights will be adjusted to the mean/variance of the new dataset. Nevertheless, during inference they will receive data which are scaled differently because the mean/variance of the original dataset will be used.

cited from http://blog.datumbox.com/the-batch-normalization-layer-of-keras-is-broken/

What fixed the problem for me, was to freeze all layers and then unfreeze all BatchNormalization layers to make them use the new dataset's statistics instead of the original statistics:

# build model
input_tensor = Input(shape=train_generator.image_shape)
base_model = inception_v3.InceptionV3(input_tensor=input_tensor,
                                      include_top=False,
                                      weights='imagenet',
                                      pooling='avg')
x = base_model.output

# freeze all layers in the base model
base_model.trainable = False

# un-freeze the BatchNorm layers
for layer in base_model.layers:
    if "BatchNormalization" in layer.__class__.__name__:
        layer.trainable = True

# add custom layers
x = Dense(1024, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(train_generator.num_classes, activation='softmax')(x)

# define new model
model = Model(inputs=input_tensor, outputs=x)

This also explains the difference in performance between training the model with frozen layers and evaluate it with a validation/test set and saving bottleneck features (with model.predict the internal backend flag set_learning_phase is set to 0 ) and training a classifier on the cached bottleneck features.

More information here:

Pull request to change this behavior (not-accepted): https://github.com/keras-team/keras/pull/9965

Similar thread: https://datascience.stackexchange.com/questions/47966/over-fitting-in-transfer-learning-with-small-dataset/72436#72436

Answer 4

I am also working on a very small dataset and encountered the same problem of validation accuracy being stuck at some point although the training accuracy keeps going higher. I also noticed that my validation loss was getting higher as well over time. FYI, I am using Resnet 50 and InceptionV3 models.

After some digging on the internet, I found a discussion on github taking place which connects this problem to the implementation of Batch Normalization layers in Keras. The above mentioned problem is encountered when applying transfer learning and fine-tuning the network. I am not sure if you have the same problem, but I have added the link below to Github where you can read more about this problem, and try to apply some tests which will help you in understanding if you are affected by the same problem.

Github link to the pull request and discussion

Answer 5

The problem is too small dataset for each class. 100k examples / 1000 classes = ~100 examples per one class. It's too small amount for that. Your network can remember all your examples in weight matrices, but for generalization you should have a lot more examples. Try use only the most common classes and figure out what's happened.

Answer 6

Here some explanation regarding fine tuning and transfer learning according to Stanford university

Very different dataset and very little dataset from image-net dataset - Try linear classifier from different stages

So to summarize

Since the dataset is very small, You may want to extract the features from the earlier layer and train a classifier on top of that and check if the problem still exists.

Massive overfit during resnet50 transfer learning

Question

6 answers

solution1
6 2018-07-04 19:15:49

solution2
4 2018-10-23 05:56:52

solution3
4 2020-04-16 14:11:36

solution4
1 2018-05-31 11:16:18

solution5
0 2018-05-16 14:02:19

solution6
0 2018-05-19 00:09:58

Massive overfit during resnet50 transfer learning

Question

6 answers

solution1 6 2018-07-04 19:15:49

solution2 4 2018-10-23 05:56:52

solution3 4 2020-04-16 14:11:36

solution4 1 2018-05-31 11:16:18

solution5 0 2018-05-16 14:02:19

solution6 0 2018-05-19 00:09:58

solution1
6 2018-07-04 19:15:49

solution2
4 2018-10-23 05:56:52

solution3
4 2020-04-16 14:11:36

solution4
1 2018-05-31 11:16:18

solution5
0 2018-05-16 14:02:19

solution6
0 2018-05-19 00:09:58