如何在没有 ImageNet 权重的情况下进行迁移学习？

Question

This is a description of my project:这是我的项目的描述：

Dataset1: The bigger dataset, contains binary classes of images. Dataset1：更大的数据集，包含图像的二进制类。

Dataset2 : Contains 2 classes that are very similar in appearance to Dataset1 . Dataset2 ：包含2个在外观上与Dataset1非常相似的类。 I want to make a model that is using transfer learning by learning from Dataset1 and apply the weights with less learning rate in Dataset2 .我想制作一个 model 通过从Dataset1学习使用迁移学习，并在Dataset2中应用学习率较低的权重。

Therefore I'm looking to train the entire VGG16 on dataset1 , then using transfer learning to finetune the last layers for dataset2 .因此，我希望在 dataset1 上训练整个VGG16 ，然后使用迁移学习来dataset2 dataset1最后一层。 I do not want to use the pre-trained imagenet database.我不想使用预训练的 imagenet 数据库。 This is the code I am using, and I have saved the wights from it:这是我正在使用的代码，我已经从中保存了 wights：


from tensorflow.keras.layers import Input, Lambda, Dense, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
import numpy as np
from glob import glob
import matplotlib.pyplot as plt

vgg = VGG16(input_shape=(244, 244, 3), weights=None, include_top=False)

# don't train existing weights
for layer in vgg.layers:
    layer.trainable = False
    
x = Flatten()(vgg.output)   

import tensorflow.keras
prediction = tensorflow.keras.layers.Dense(2, activation='softmax')(x)

model = Model(inputs=vgg.input, outputs=prediction)

model.compile(
  loss='categorical_crossentropy',
  optimizer='adam',
  metrics=['accuracy']
)

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_directory('chest_xray/train',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical')

test_set = train_datagen.flow_from_directory('chest_xray/test',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical')

# fit the model
r = model.fit_generator(
  training_set,
  validation_data=test_set,
  epochs=5,
  steps_per_epoch=len(training_set),
  validation_steps=len(test_set)
)

model.save_weights('first_try.h5')

Answer 1

Update更新

Based on your query, it seems that the class number won't be different in Dataset2 .根据您的查询，似乎 class 编号在Dataset2中不会有所不同。 At the same time, you also don't want to use image net weight.同时，您也不想使用图像净重。 So, in that case, you don't need to map or store the weight (as described below).因此，在这种情况下，您不需要 map 或存储重量（如下所述）。 Just load the model and weight and train on Dataset2 .只需加载 model 并在Dataset2上进行加权和训练。 Freeze the all trained layer from Dataset1 and train the last layer on Dataset2 ;冻结 Dataset1 中的所有训练层并在Dataset2上训练最后一层； really straight forward.真的很直接。

In my below response, though you're not needed the full information, I am keeping that anyway for future reference.在我的以下回复中，尽管您不需要完整的信息，但我仍将其保留以供将来参考。

Here is a small demonstration of what you probably need.这是您可能需要的一个小演示。 Hope it gives you some insight.希望它能给你一些见解。 Here we will train the CIRFAR data set which has 10 classes and try to use it for transfer learning with on different data set which probably has different input sizes and a different number of classes.在这里，我们将训练具有10类的CIRFAR数据集，并尝试将其用于可能具有不同输入大小和不同类数的不同数据集的迁移学习。

Preparing the CIFAR (10 Classes)准备 CIFAR（10 节课）

import numpy as np
import tensorflow as tf 
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# train set / data 
x_train = x_train.astype('float32') / 255

# validation set / data 
x_test = x_test.astype('float32') / 255

# train set / target 
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
# validation set / target 
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)

print(x_train.shape, y_train.shape) 
print(x_test.shape, y_test.shape)  
'''
(50000, 32, 32, 3) (50000, 10)
(10000, 32, 32, 3) (10000, 10)
'''

Model Model

# declare input shape 
input = tf.keras.Input(shape=(32,32,3))
# Block 1
x = tf.keras.layers.Conv2D(32, 3, strides=2, activation="relu")(input)
x = tf.keras.layers.MaxPooling2D(3)(x)

# Now that we apply global max pooling.
gap = tf.keras.layers.GlobalMaxPooling2D()(x)

# Finally, we add a classification layer.
output = tf.keras.layers.Dense(10, activation='softmax')(gap)

# bind all
func_model = tf.keras.Model(input, output)

'''
Model: "functional_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 15, 15, 32)        896       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 32)          0         
_________________________________________________________________
global_max_pooling2d_1 (Glob (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                330       
=================================================================
Total params: 1,226
Trainable params: 1,226
Non-trainable params: 0
'''

Run the model to get some weight matrices as follows:运行 model 得到一些权重矩阵，如下所示：

# compile 
print('\nFunctional API')
func_model.compile(
          loss      = tf.keras.losses.CategoricalCrossentropy(),
          metrics   = tf.keras.metrics.CategoricalAccuracy(),
          optimizer = tf.keras.optimizers.Adam())
# fit 
func_model.fit(x_train, y_train, batch_size=128, epochs=1)

Transfer Learning迁移学习

Let's use it for MNIST .让我们将它用于MNIST 。 It also has 10 classes but for sake of need a different number of classes, we will make even and odd categories from it ( 2 classes).它也有10类，但为了需要不同数量的类，我们将从中创建even和odd类（ 2 个类）。 Below how we will prepare these data sets下面我们将如何准备这些数据集

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# train set / data 
x_train = np.expand_dims(x_train, axis=-1)
x_train = np.repeat(x_train, 3, axis=-1)
x_train = x_train.astype('float32') / 255
# train set / target 
y_train = tf.keras.utils.to_categorical((y_train % 2 == 0).astype(int), 
                                        num_classes=2)

# validation set / data 
x_test = np.expand_dims(x_test, axis=-1)
x_test = np.repeat(x_test, 3, axis=-1)
x_test = x_test.astype('float32') / 255
# validation set / target 

y_test = tf.keras.utils.to_categorical((y_test % 2 == 0).astype(int), 
                                       num_classes=2)

print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)  
'''
(60000, 28, 28, 3) (60000, 2)
(10000, 28, 28, 3) (10000, 2)
'''

If you're familiar with the usage of ImageNet pretrained weight in the keras model, you probably use include_top .如果您熟悉 keras model 中keras预训练权重的用法，您可能会使用include_top 。 By setting it False we can easily load a weight file that has no top information of the pretrained models.通过将其设置为False ，我们可以轻松加载没有预训练模型的顶级信息的权重文件。 So here we need to manually (kinda) do that.所以在这里我们需要手动（有点）这样做。 We need to grab the weight matrices until the last activation layer (in our case which is Dense(10, softmax) ).我们需要获取权重矩阵，直到最后一个激活层（在我们的例子中是Dense(10, softmax) ）。 And put it in the new instance of the base model, and + we add a new classifier layer (in our case that will be Dense(2, softmax) .并将其放入基础 model 的新实例中，然后我们添加一个新的分类器层（在我们的示例中为Dense(2, softmax) 。

for i, layer in enumerate(func_model.layers):
    print(i,'\t',layer.trainable,'\t  :',layer.name)

'''
  Train_Bool  : Layer Names
0    True     : input_1
1    True     : conv2d
2    True     : max_pooling2d
3    True     : global_max_pooling2d # < we go till here to grab the weight and biases
4    True     : dense  # 10 classes (from previous model)
'''

Get Weights获取权重

sparsified_weights = []
for w in func_model.get_layer(name='global_max_pooling2d').get_weights():
    sparsified_weights.append(w)

By that, we map the weight from the old model except for the classifier layers ( Dense ).这样，我们 map 的权重来自旧的 model，除了分类器层（ Dense ）。 Please note, here we grab the weight until the GAP layer, which is there right before the classifier.请注意，这里我们抓取到GAP层的权重，它就在分类器之前。

Now, we will create a new model, the same as the old model except for the last layer ( 10 Dense ), and at the same time add a new Dense with 2 unit.现在，我们将创建一个新的 model，与旧的 model 相同，除了最后一层（ 10 Dense ），同时添加一个新的Dense有2单元。

predictions    = Dense(2, activation='softmax')(func_model.layers[-2].output)
new_func_model = Model(inputs=func_model.inputs, outputs = predictions)

And now we can set weight as follows to the new model:现在我们可以为新的 model 设置权重如下：

new_func_model.get_layer(name='global_max_pooling2d').set_weights(sparsified_weights)

You can check to verify as follows;您可以检查以验证如下； all will be the same except the last layer.除了最后一层外，一切都将相同。

func_model.get_weights()      # last layer, Dense (10)
new_func_model.get_weights()  # last layer, Dense (2)

Now you can train the model with new data set, in our case which was MNIST现在你可以用新的数据集训练 model，在我们的例子中是MNIST

new_func_model.compile(optimizer='adam', loss='categorical_crossentropy')
new_func_model.summary()

'''
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 15, 15, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 5, 5, 32)          0         
_________________________________________________________________
global_max_pooling2d (Global (None, 32)                0         
_________________________________________________________________
dense_6 (Dense)              (None, 2)                 66        
=================================================================
Total params: 962
Trainable params: 962
Non-trainable params: 0
'''

# compile 
print('\nFunctional API')
new_func_model.compile(
          loss      = tf.keras.losses.CategoricalCrossentropy(),
          metrics   = tf.keras.metrics.CategoricalAccuracy(),
          optimizer = tf.keras.optimizers.Adam())
# fit 
new_func_model.fit(x_train, y_train, batch_size=128, epochs=1)

WARNING:tensorflow:Model was constructed with shape (None, 32, 32, 3) for input Tensor("input_1:0", shape=(None, 32, 32, 3), dtype=float32), but it was called on an input with incompatible shape (None, 28, 28, 3).
WARNING:tensorflow:Model was constructed with shape (None, 32, 32, 3) for input Tensor("input_1:0", shape=(None, 32, 32, 3), dtype=float32), but it was called on an input with incompatible shape (None, 28, 28, 3).
469/469 [==============================] - 1s 3ms/step - loss: 0.6453 - categorical_accuracy: 0.6447
<tensorflow.python.keras.callbacks.History at 0x7f7af016feb8>

Answer 2

couple of issues.几个问题。 You are not using the imagenet weights (can't imagine why not ) then you set all the layer of the VGG network as untrainable.您没有使用 imagenet 权重（无法想象为什么不使用），然后您将 VGG 网络的所有层设置为不可训练。 So you will start out with random weights and stay will stay random.因此，您将从随机权重开始，并且保持随机。 Then you add the Flatten and prediction layers and try to train.然后添加 Flatten 和预测层并尝试训练。 All you will be training is a single dense layer.您将要训练的只是一个密集层。 Doubt that will work very well but I guess it will learn something.怀疑这会很好，但我想它会学到一些东西。 I would use the imagenet weights as a minimum, I also prefer to train the entire model to get the best results.我会至少使用 imagenet 权重，我也更喜欢训练整个 model 以获得最佳结果。 Then next issue is the code然后下一个问题是代码

test_set = train_datagen.flow_from_directory('chest_xray/test',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical')
#where
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

You do not want to perform image augmentation on a test set which is what you will be doing so use您不想在测试集上执行图像增强，这就是您将要执行的操作，因此请使用

test_set =  ImageDataGenerator(rescale = 1./255).flow_from_directory('chest_xray/test',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical'
                                                 shuffle=False)

Next you are using model.fit_generator, that will work for now but is being depreciated in future versions of tensorflow.接下来，您将使用 model.fit_generator，它现在可以工作，但在 tensorflow 的未来版本中会被贬值。 Use model.fit it now works with generators I think starting with tensorflow 1.5.使用 model.fit 它现在适用于我认为从 tensorflow 1.5 开始的发电机。 What you call the test set is normally called the validation set so that's OK.您所说的测试集通常称为验证集，所以没关系。 You have specified a batch_size=32.However you have the code in model.fit您已经指定了 batch_size=32。但是您在 model.fit 中有代码

steps_per_epoch=len(training_set),
validation_steps=len(test_set)
# what you want is
steps_per epoch=len(training_set)//32
validation_steps=len(test_set)//32

Once you train your model it has the weights you desire to use for dataset2.一旦你训练了你的 model，它就有了你希望用于 dataset2 的权重。 Just create new generators for datset2 and retrain it with model.fit只需为 datset2 创建新的生成器并使用 model.fit 重新训练它

如何在没有 ImageNet 权重的情况下进行迁移学习？

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-12-04 03:53:12

Update更新

Preparing the CIFAR (10 Classes)准备 CIFAR（10 节课）

Model Model

Transfer Learning迁移学习

解决方案2
1 2020-12-04 05:28:13

如何在没有 ImageNet 权重的情况下进行迁移学习？

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-12-04 03:53:12

Update更新

Preparing the CIFAR (10 Classes)准备 CIFAR（10 节课）

Model Model

Transfer Learning迁移学习

解决方案2 1 2020-12-04 05:28:13

解决方案1
2 已采纳 2020-12-04 03:53:12

解决方案2
1 2020-12-04 05:28:13