简体   繁体   中英

Why is the model not learning with pretrained vgg16 in keras?

I am using the pre-trained VGG 16 model available with Keras and applying it on the SVHN dataset which is a dataset of 10 classes of number 0 - 10. The network is not learning and has been stuck at 0.17 accuracy. There is something that I am doing incorrectly but I am unable to recognise it. The way I am running my training is as follows:

import tensorflow.keras as keras

vgg16 = keras.applications.vgg16.VGG16()

model = keras.Sequential()
for layer in vgg16.layers:


for layer in model.layers:
   layer.trainable = False

model.add(keras.layers.Dense(10, activation = "softmax"))

train_optimizer_rmsProp = keras.optimizers.RMSprop(lr=0.0001)
model.compile(loss="categorical_crossentropy", optimizer=train_optimizer_rmsProp, metrics=['accuracy'])
batch_size = 128*1

data_generator = keras.preprocessing.image.ImageDataGenerator(
    rescale = 1./255

train_generator = data_generator.flow_from_directory(
        target_size=(224, 224),

validation_generator = data_generator.flow_from_directory(
        target_size=(224, 224),

history = model.fit_generator(
    validation_data = validation_generator, 
    validation_steps = math.ceil(val_split_length / batch_size),
    epochs = 15, 
    steps_per_epoch = math.ceil(num_train_samples / batch_size), 
    use_multiprocessing = True, 
    workers = 8, 
    callbacks = model_callbacks, 
    verbose = 2

What is it that I am doing wrong? Is there something that I am missing? I was expecting a very high accuracy since it is carrying weights from imagenet but it is stuck at 0.17 accuracy from the first epoch.

I assume you're upsampling the 32x32 MNIST-like images to fit the VGG16 input, what you should actually do in this case is to remove all the dense layers, this way you can input any image size as in convolutional layers the weights are agnostic to the image size.

You can do this like:

vgg16 = keras.applications.vgg16.VGG16(include_top=False, input_shape=(32, 32))

Which I consider should be the default behaviour of the constructor.

When you upsample the image, best case scenario you're basically blurring it, in this case you have to consider that a single pixel of the original image corresponds to 7 pixels of the upsampled one, while VGG16's filters are 3 pixels wide, so in other words you're losing the image's features.

It is not necessary to add 3 dense layers at the end like the original VGG16, you can try with the same layer you have in your code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM