如何在没有 ImageNet 权重的情况下进行迁移学习？

Question

这是我的项目的描述：

Dataset1：更大的数据集，包含图像的二进制类。

Dataset2 ：包含2个在外观上与Dataset1非常相似的类。 我想制作一个 model 通过从Dataset1学习使用迁移学习，并在Dataset2中应用学习率较低的权重。

因此，我希望在 dataset1 上训练整个VGG16 ，然后使用迁移学习来dataset2 dataset1最后一层。 我不想使用预训练的 imagenet 数据库。 这是我正在使用的代码，我已经从中保存了 wights：


from tensorflow.keras.layers import Input, Lambda, Dense, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
import numpy as np
from glob import glob
import matplotlib.pyplot as plt

vgg = VGG16(input_shape=(244, 244, 3), weights=None, include_top=False)

# don't train existing weights
for layer in vgg.layers:
    layer.trainable = False
    
x = Flatten()(vgg.output)   

import tensorflow.keras
prediction = tensorflow.keras.layers.Dense(2, activation='softmax')(x)

model = Model(inputs=vgg.input, outputs=prediction)

model.compile(
  loss='categorical_crossentropy',
  optimizer='adam',
  metrics=['accuracy']
)

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_directory('chest_xray/train',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical')

test_set = train_datagen.flow_from_directory('chest_xray/test',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical')

# fit the model
r = model.fit_generator(
  training_set,
  validation_data=test_set,
  epochs=5,
  steps_per_epoch=len(training_set),
  validation_steps=len(test_set)
)

model.save_weights('first_try.h5')

Answer 1

更新

根据您的查询，似乎 class 编号在Dataset2中不会有所不同。 同时，您也不想使用图像净重。 因此，在这种情况下，您不需要 map 或存储重量（如下所述）。 只需加载 model 并在Dataset2上进行加权和训练。 冻结 Dataset1 中的所有训练层并在Dataset2上训练最后一层； 真的很直接。

在我的以下回复中，尽管您不需要完整的信息，但我仍将其保留以供将来参考。

这是您可能需要的一个小演示。 希望它能给你一些见解。 在这里，我们将训练具有10类的CIRFAR数据集，并尝试将其用于可能具有不同输入大小和不同类数的不同数据集的迁移学习。

准备 CIFAR（10 节课）

import numpy as np
import tensorflow as tf 
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# train set / data 
x_train = x_train.astype('float32') / 255

# validation set / data 
x_test = x_test.astype('float32') / 255

# train set / target 
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
# validation set / target 
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)

print(x_train.shape, y_train.shape) 
print(x_test.shape, y_test.shape)  
'''
(50000, 32, 32, 3) (50000, 10)
(10000, 32, 32, 3) (10000, 10)
'''

Model

# declare input shape 
input = tf.keras.Input(shape=(32,32,3))
# Block 1
x = tf.keras.layers.Conv2D(32, 3, strides=2, activation="relu")(input)
x = tf.keras.layers.MaxPooling2D(3)(x)

# Now that we apply global max pooling.
gap = tf.keras.layers.GlobalMaxPooling2D()(x)

# Finally, we add a classification layer.
output = tf.keras.layers.Dense(10, activation='softmax')(gap)

# bind all
func_model = tf.keras.Model(input, output)

'''
Model: "functional_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 15, 15, 32)        896       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 32)          0         
_________________________________________________________________
global_max_pooling2d_1 (Glob (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                330       
=================================================================
Total params: 1,226
Trainable params: 1,226
Non-trainable params: 0
'''

运行 model 得到一些权重矩阵，如下所示：

# compile 
print('\nFunctional API')
func_model.compile(
          loss      = tf.keras.losses.CategoricalCrossentropy(),
          metrics   = tf.keras.metrics.CategoricalAccuracy(),
          optimizer = tf.keras.optimizers.Adam())
# fit 
func_model.fit(x_train, y_train, batch_size=128, epochs=1)

迁移学习

让我们将它用于MNIST 。 它也有10类，但为了需要不同数量的类，我们将从中创建even和odd类（ 2 个类）。 下面我们将如何准备这些数据集

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# train set / data 
x_train = np.expand_dims(x_train, axis=-1)
x_train = np.repeat(x_train, 3, axis=-1)
x_train = x_train.astype('float32') / 255
# train set / target 
y_train = tf.keras.utils.to_categorical((y_train % 2 == 0).astype(int), 
                                        num_classes=2)

# validation set / data 
x_test = np.expand_dims(x_test, axis=-1)
x_test = np.repeat(x_test, 3, axis=-1)
x_test = x_test.astype('float32') / 255
# validation set / target 

y_test = tf.keras.utils.to_categorical((y_test % 2 == 0).astype(int), 
                                       num_classes=2)

print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)  
'''
(60000, 28, 28, 3) (60000, 2)
(10000, 28, 28, 3) (10000, 2)
'''

如果您熟悉 keras model 中keras预训练权重的用法，您可能会使用include_top 。 通过将其设置为False ，我们可以轻松加载没有预训练模型的顶级信息的权重文件。 所以在这里我们需要手动（有点）这样做。 我们需要获取权重矩阵，直到最后一个激活层（在我们的例子中是Dense(10, softmax) ）。 并将其放入基础 model 的新实例中，然后我们添加一个新的分类器层（在我们的示例中为Dense(2, softmax) 。

for i, layer in enumerate(func_model.layers):
    print(i,'\t',layer.trainable,'\t  :',layer.name)

'''
  Train_Bool  : Layer Names
0    True     : input_1
1    True     : conv2d
2    True     : max_pooling2d
3    True     : global_max_pooling2d # < we go till here to grab the weight and biases
4    True     : dense  # 10 classes (from previous model)
'''

获取权重

sparsified_weights = []
for w in func_model.get_layer(name='global_max_pooling2d').get_weights():
    sparsified_weights.append(w)

这样，我们 map 的权重来自旧的 model，除了分类器层（ Dense ）。 请注意，这里我们抓取到GAP层的权重，它就在分类器之前。

现在，我们将创建一个新的 model，与旧的 model 相同，除了最后一层（ 10 Dense ），同时添加一个新的Dense有2单元。

predictions    = Dense(2, activation='softmax')(func_model.layers[-2].output)
new_func_model = Model(inputs=func_model.inputs, outputs = predictions)

现在我们可以为新的 model 设置权重如下：

new_func_model.get_layer(name='global_max_pooling2d').set_weights(sparsified_weights)

您可以检查以验证如下； 除了最后一层外，一切都将相同。

func_model.get_weights()      # last layer, Dense (10)
new_func_model.get_weights()  # last layer, Dense (2)

现在你可以用新的数据集训练 model，在我们的例子中是MNIST

new_func_model.compile(optimizer='adam', loss='categorical_crossentropy')
new_func_model.summary()

'''
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 15, 15, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 5, 5, 32)          0         
_________________________________________________________________
global_max_pooling2d (Global (None, 32)                0         
_________________________________________________________________
dense_6 (Dense)              (None, 2)                 66        
=================================================================
Total params: 962
Trainable params: 962
Non-trainable params: 0
'''

# compile 
print('\nFunctional API')
new_func_model.compile(
          loss      = tf.keras.losses.CategoricalCrossentropy(),
          metrics   = tf.keras.metrics.CategoricalAccuracy(),
          optimizer = tf.keras.optimizers.Adam())
# fit 
new_func_model.fit(x_train, y_train, batch_size=128, epochs=1)

WARNING:tensorflow:Model was constructed with shape (None, 32, 32, 3) for input Tensor("input_1:0", shape=(None, 32, 32, 3), dtype=float32), but it was called on an input with incompatible shape (None, 28, 28, 3).
WARNING:tensorflow:Model was constructed with shape (None, 32, 32, 3) for input Tensor("input_1:0", shape=(None, 32, 32, 3), dtype=float32), but it was called on an input with incompatible shape (None, 28, 28, 3).
469/469 [==============================] - 1s 3ms/step - loss: 0.6453 - categorical_accuracy: 0.6447
<tensorflow.python.keras.callbacks.History at 0x7f7af016feb8>

Answer 2

几个问题。 您没有使用 imagenet 权重（无法想象为什么不使用），然后您将 VGG 网络的所有层设置为不可训练。 因此，您将从随机权重开始，并且保持随机。 然后添加 Flatten 和预测层并尝试训练。 您将要训练的只是一个密集层。 怀疑这会很好，但我想它会学到一些东西。 我会至少使用 imagenet 权重，我也更喜欢训练整个 model 以获得最佳结果。 然后下一个问题是代码

test_set = train_datagen.flow_from_directory('chest_xray/test',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical')
#where
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

您不想在测试集上执行图像增强，这就是您将要执行的操作，因此请使用

test_set =  ImageDataGenerator(rescale = 1./255).flow_from_directory('chest_xray/test',
                                                 target_size = (224, 224),
                                                 batch_size = 32,
                                                 class_mode = 'categorical'
                                                 shuffle=False)

接下来，您将使用 model.fit_generator，它现在可以工作，但在 tensorflow 的未来版本中会被贬值。 使用 model.fit 它现在适用于我认为从 tensorflow 1.5 开始的发电机。 您所说的测试集通常称为验证集，所以没关系。 您已经指定了 batch_size=32。但是您在 model.fit 中有代码

steps_per_epoch=len(training_set),
validation_steps=len(test_set)
# what you want is
steps_per epoch=len(training_set)//32
validation_steps=len(test_set)//32

一旦你训练了你的 model，它就有了你希望用于 dataset2 的权重。 只需为 datset2 创建新的生成器并使用 model.fit 重新训练它

如何在没有 ImageNet 权重的情况下进行迁移学习？

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-12-04 03:53:12

更新

准备 CIFAR（10 节课）

Model

迁移学习

解决方案2
1 2020-12-04 05:28:13

如何在没有 ImageNet 权重的情况下进行迁移学习？

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-12-04 03:53:12

更新

准备 CIFAR（10 节课）

Model

迁移学习

解决方案2 1 2020-12-04 05:28:13

解决方案1
2 已采纳 2020-12-04 03:53:12

解决方案2
1 2020-12-04 05:28:13