[英]ResNet50 Model is not learning with transfer learning in keras
I am trying to perform transfer learning on ResNet50 model pretrained on Imag.net weights for PASCAL VOC 2012 dataset.我正在尝试在 ResNet50 model 上执行迁移学习,该 ResNet50 model 在 PASCAL VOC 2012 数据集的 Imag.net 权重上进行了预训练。 As it is a multi label dataset, I am using
sigmoid
activation function in the final layer and binary_crossentropy
loss.因为它是一个多 label 数据集,所以我在最后一层使用
sigmoid
激活 function 和binary_crossentropy
损失。 The metrics are precision,recall and accuracy
.指标是
precision,recall and accuracy
。 Below is the code I used to build the model for 20 classes (PASCAL VOC has 20 classes).下面是我用来为 20 个类构建 model 的代码(PASCAL VOC 有 20 个类)。
img_height,img_width = 128,128
num_classes = 20
#If imagenet weights are being loaded,
#input must have a static square shape (one of (128, 128), (160, 160), (192, 192), or (224, 224))
base_model = applications.resnet50.ResNet50(weights= 'imagenet', include_top=False, input_shape= (img_height,img_width,3))
x = base_model.output
x = GlobalAveragePooling2D()(x)
#x = Dropout(0.7)(x)
predictions = Dense(num_classes, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = predictions)
for layer in model.layers[-2:]:
layer.trainable=True
for layer in model.layers[:-3]:
layer.trainable=False
adam = Adam(lr=0.0001)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy',precision_m,recall_m])
#print(model.summary())
X_train, X_test, Y_train, Y_test = train_test_split(x_train, y, random_state=42, test_size=0.2)
savingcheckpoint = ModelCheckpoint('ResnetTL.h5',monitor='val_loss',verbose=1,save_best_only=True,mode='min')
earlystopcheckpoint = EarlyStopping(monitor='val_loss',patience=10,verbose=1,mode='min',restore_best_weights=True)
model.fit(X_train, Y_train, epochs=epochs, validation_data=(X_test,Y_test), batch_size=batch_size,callbacks=[savingcheckpoint,earlystopcheckpoint],shuffle=True)
model.save_weights('ResnetTLweights.h5')
It ran for 35 epochs until earlystopping and the metrics are as follows (without Dropout layer):它运行了 35 个 epochs 直到 earlystopping,指标如下(没有 Dropout 层):
loss: 0.1195 - accuracy: 0.9551 - precision_m: 0.8200 - recall_m: 0.5420 - val_loss: 0.3535 - val_accuracy: 0.8358 - val_precision_m: 0.0583 - val_recall_m: 0.0757
Even with Dropout layer, I don't see much difference.即使有 Dropout 层,我也看不出有什么区别。
loss: 0.1584 - accuracy: 0.9428 - precision_m: 0.7212 - recall_m: 0.4333 - val_loss: 0.3508 - val_accuracy: 0.8783 - val_precision_m: 0.0595 - val_recall_m: 0.0403
With dropout, for a few epochs, the model is reaching to a validation precision and accuracy of 0.2 but not above that.使用 dropout,在几个时期内,model 达到了 0.2 的验证精度和准确度,但不超过该值。
I see that precision and recall of validation set is pretty low compared to training set with and without dropout layer.我发现与有和没有 dropout 层的训练集相比,验证集的精度和召回率都非常低。 How should I interpret this?
我应该如何解释这个? Does this mean the model is overfitting.
这是否意味着 model 过拟合。 If so, what should I do?
如果是这样,我该怎么办? As of now the model predictions are quite random (totally incorrect).
到目前为止,model 的预测是相当随机的(完全不正确)。 The dataset size is 11000 images.
数据集大小为 11000 张图像。
Please can you modify code as below and try to execute请您修改如下代码并尝试执行
From:从:
predictions = Dense(num_classes, activation= 'sigmoid')(x)
To:至:
predictions = Dense(num_classes, activation= 'softmax')(x)
From:从:
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy',precision_m,recall_m])
To:至:
model.compile(optimizer= adam, loss='categorical_crossentropy', metrics=['accuracy',precision_m,recall_m])
This question is pretty old, but I'll answer it in case it is helpful to someone else:这个问题很老了,但我会回答它以防它对其他人有帮助:
In this example, you froze all layers except by the last two (Global Average Pooling and the last Dense one).在此示例中,您冻结了除最后两个层(全局平均池化层和最后一个密集层)之外的所有层。 There is a cleaner way to achieve the same state:
有一种更简洁的方法可以实现相同的 state:
rn50 = applications.resnet50.ResNet50(weights='imagenet', include_top=False,
input_shape=(img_height, img_width, 3))
x = rn50.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(num_classes, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = predictions)
rn50.trainable = False # <- this
model.compile(...)
In this case, features are being extracted from the ResNet50.network and fed to a linear softmax classifier, but the ResNet50's weights are not being trained.在这种情况下,正在从 ResNet50.network 中提取特征并将其馈送到线性 softmax 分类器,但未训练 ResNet50 的权重。 This is called feature extraction, not fine-tuning.
这称为特征提取,而不是微调。
The only weights being trained are from your classifier, which was instantiated with weights drawn from a random distribution, and thus should be entirely trained.唯一被训练的权重来自你的分类器,它是用从随机分布中抽取的权重实例化的,因此应该完全训练。 You should be using Adam with its default learning rate:
您应该使用默认学习率的 Adam:
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.001))
So you can train it for a few epochs, and, once it's done, then you unfreeze the backbone and "fine-tune" it:所以你可以训练它几个时期,一旦完成,你就解冻骨干并“微调”它:
backbone.trainable = False
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.001))
model.fit(epochs=50)
backbone.trainable = True
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.00001))
model.fit(epochs=60, initial_epoch=50)
There is a nice article about this on Keras website: https://keras.io/guides/transfer_learning/ Keras 网站上有一篇关于此的好文章: https://keras.io/guides/transfer_learning/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.