How to do Transfer Learning without ImageNet weights?

This is a description of my project:

Dataset1: The bigger dataset, contains binary classes of images.

Dataset2 : Contains 2 classes that are very similar in appearance to Dataset1 . I want to make a model that is using transfer learning by learning from Dataset1 and apply the weights with less learning rate in Dataset2 .

Therefore I'm looking to train the entire VGG16 on dataset1 , then using transfer learning to finetune the last layers for dataset2 . I do not want to use the pre-trained imagenet database. This is the code I am using, and I have saved the wights from it:

from tensorflow.keras.layers import Input, Lambda, Dense, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
import numpy as np
from glob import glob
import matplotlib.pyplot as plt

vgg = VGG16(input_shape=(244, 244, 3), weights=None, include_top=False)

# don't train existing weights
for layer in vgg.layers:
    layer.trainable = False
x = Flatten()(vgg.output)   

import tensorflow.keras
prediction = tensorflow.keras.layers.Dense(2, activation='softmax')(x)

model = Model(inputs=vgg.input, outputs=prediction)


from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_directory('chest_xray/train',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical')

test_set = train_datagen.flow_from_directory('chest_xray/test',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical')

# fit the model
r = model.fit_generator(



Based on your query, it seems that the class number won't be different in Dataset2 . At the same time, you also don't want to use image net weight. So, in that case, you don't need to map or store the weight (as described below). Just load the model and weight and train on Dataset2 . Freeze the all trained layer from Dataset1 and train the last layer on Dataset2 ; really straight forward.

In my below response, though you're not needed the full information, I am keeping that anyway for future reference.

Here is a small demonstration of what you probably need. Hope it gives you some insight. Here we will train the CIRFAR data set which has 10 classes and try to use it for transfer learning with on different data set which probably has different input sizes and a different number of classes.

Preparing the CIFAR (10 Classes)

import numpy as np
import tensorflow as tf 
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# train set / data 
x_train = x_train.astype('float32') / 255

# validation set / data 
x_test = x_test.astype('float32') / 255

# train set / target 
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
# validation set / target 
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)

print(x_train.shape, y_train.shape) 
print(x_test.shape, y_test.shape)  
(50000, 32, 32, 3) (50000, 10)
(10000, 32, 32, 3) (10000, 10)


# declare input shape 
input = tf.keras.Input(shape=(32,32,3))
# Block 1
x = tf.keras.layers.Conv2D(32, 3, strides=2, activation="relu")(input)
x = tf.keras.layers.MaxPooling2D(3)(x)

# Now that we apply global max pooling.
gap = tf.keras.layers.GlobalMaxPooling2D()(x)

# Finally, we add a classification layer.
output = tf.keras.layers.Dense(10, activation='softmax')(gap)

# bind all
func_model = tf.keras.Model(input, output)

Model: "functional_3"
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 32, 32, 3)]       0         
conv2d_1 (Conv2D)            (None, 15, 15, 32)        896       
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 32)          0         
global_max_pooling2d_1 (Glob (None, 32)                0         
dense_1 (Dense)              (None, 10)                330       
Total params: 1,226
Trainable params: 1,226
Non-trainable params: 0

Run the model to get some weight matrices as follows:

# compile 
print('\nFunctional API')
          loss      = tf.keras.losses.CategoricalCrossentropy(),
          metrics   = tf.keras.metrics.CategoricalAccuracy(),
          optimizer = tf.keras.optimizers.Adam())
# fit 
func_model.fit(x_train, y_train, batch_size=128, epochs=1)

Transfer Learning

Let's use it for MNIST . It also has 10 classes but for sake of need a different number of classes, we will make even and odd categories from it ( 2 classes). Below how we will prepare these data sets

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# train set / data 
x_train = np.expand_dims(x_train, axis=-1)
x_train = np.repeat(x_train, 3, axis=-1)
x_train = x_train.astype('float32') / 255
# train set / target 
y_train = tf.keras.utils.to_categorical((y_train % 2 == 0).astype(int), 

# validation set / data 
x_test = np.expand_dims(x_test, axis=-1)
x_test = np.repeat(x_test, 3, axis=-1)
x_test = x_test.astype('float32') / 255
# validation set / target 

y_test = tf.keras.utils.to_categorical((y_test % 2 == 0).astype(int), 

print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)  
(60000, 28, 28, 3) (60000, 2)
(10000, 28, 28, 3) (10000, 2)

If you're familiar with the usage of ImageNet pretrained weight in the keras model, you probably use include_top . By setting it False we can easily load a weight file that has no top information of the pretrained models. So here we need to manually (kinda) do that. We need to grab the weight matrices until the last activation layer (in our case which is Dense(10, softmax) ). And put it in the new instance of the base model, and + we add a new classifier layer (in our case that will be Dense(2, softmax) .

for i, layer in enumerate(func_model.layers):
    print(i,'\t',layer.trainable,'\t  :',layer.name)

  Train_Bool  : Layer Names
0    True     : input_1
1    True     : conv2d
2    True     : max_pooling2d
3    True     : global_max_pooling2d # < we go till here to grab the weight and biases
4    True     : dense  # 10 classes (from previous model)

Get Weights

sparsified_weights = []
for w in func_model.get_layer(name='global_max_pooling2d').get_weights():

By that, we map the weight from the old model except for the classifier layers ( Dense ). Please note, here we grab the weight until the GAP layer, which is there right before the classifier.

Now, we will create a new model, the same as the old model except for the last layer ( 10 Dense ), and at the same time add a new Dense with 2 unit.

predictions    = Dense(2, activation='softmax')(func_model.layers[-2].output)
new_func_model = Model(inputs=func_model.inputs, outputs = predictions) 

And now we can set weight as follows to the new model:


You can check to verify as follows; all will be the same except the last layer.

func_model.get_weights()      # last layer, Dense (10)
new_func_model.get_weights()  # last layer, Dense (2)

Now you can train the model with new data set, in our case which was MNIST

new_func_model.compile(optimizer='adam', loss='categorical_crossentropy')

Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 32, 32, 3)]       0         
conv2d (Conv2D)              (None, 15, 15, 32)        896       
max_pooling2d (MaxPooling2D) (None, 5, 5, 32)          0         
global_max_pooling2d (Global (None, 32)                0         
dense_6 (Dense)              (None, 2)                 66        
Total params: 962
Trainable params: 962
Non-trainable params: 0

# compile 
print('\nFunctional API')
          loss      = tf.keras.losses.CategoricalCrossentropy(),
          metrics   = tf.keras.metrics.CategoricalAccuracy(),
          optimizer = tf.keras.optimizers.Adam())
# fit 
new_func_model.fit(x_train, y_train, batch_size=128, epochs=1)
WARNING:tensorflow:Model was constructed with shape (None, 32, 32, 3) for input Tensor("input_1:0", shape=(None, 32, 32, 3), dtype=float32), but it was called on an input with incompatible shape (None, 28, 28, 3).
WARNING:tensorflow:Model was constructed with shape (None, 32, 32, 3) for input Tensor("input_1:0", shape=(None, 32, 32, 3), dtype=float32), but it was called on an input with incompatible shape (None, 28, 28, 3).
469/469 [==============================] - 1s 3ms/step - loss: 0.6453 - categorical_accuracy: 0.6447
<tensorflow.python.keras.callbacks.History at 0x7f7af016feb8>

couple of issues. You are not using the imagenet weights (can't imagine why not ) then you set all the layer of the VGG network as untrainable. So you will start out with random weights and stay will stay random. Then you add the Flatten and prediction layers and try to train. All you will be training is a single dense layer. Doubt that will work very well but I guess it will learn something. I would use the imagenet weights as a minimum, I also prefer to train the entire model to get the best results. Then next issue is the code

test_set = train_datagen.flow_from_directory('chest_xray/test',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical')
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

You do not want to perform image augmentation on a test set which is what you will be doing so use

test_set =  ImageDataGenerator(rescale = 1./255).flow_from_directory('chest_xray/test',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical'

Next you are using model.fit_generator, that will work for now but is being depreciated in future versions of tensorflow. Use model.fit it now works with generators I think starting with tensorflow 1.5. What you call the test set is normally called the validation set so that's OK. You have specified a batch_size=32.However you have the code in model.fit

# what you want is
steps_per epoch=len(training_set)//32

Once you train your model it has the weights you desire to use for dataset2. Just create new generators for datset2 and retrain it with model.fit

