简体   繁体   English

为什么我在 Keras 中的 resnet50 model 不收敛?

[英]Why is my resnet50 model in Keras not converging?

I am currently trying to classify integrated circuits in defect and non defect images.我目前正在尝试对缺陷和非缺陷图像中的集成电路进行分类。 I already tried VGG16 and InceptionV3 and got really good results for both (95% validation accuracy and low val loss).我已经尝试过 VGG16 和 InceptionV3 并且两者都得到了非常好的结果(95% 的验证准确率和低 val 损失)。 Now I wanted to try resnet50 but my model is not converging.现在我想尝试 resnet50,但我的 model 没有收敛。 Its accuracy is at 95 % too but the validation loss keeps increasing while the val acc gets stuck at 50 %.它的准确率也达到了 95%,但验证损失不断增加,而 val acc 卡在 50%。

This is my script so far:到目前为止,这是我的脚本:

from keras.applications.resnet50 import ResNet50
from keras.optimizers import Adam
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Dropout
from keras import backend as K
from keras_preprocessing.image import ImageDataGenerator
import tensorflow as tf

class ResNet:
    def __init__(self):
        self.img_width, self.img_height = 224, 224  # Dimensions of cropped image
        self.classes_num = 2  # Number of classifications

        # Training configurations
        self.epochs = 32
        self.batch_size = 16  # Play with this to determine number of images to train on per epoch
        self.lr = 0.0001

    def build_model(self, train_path):
        train_data_path = train_path
        train_datagen = ImageDataGenerator(rescale=1. / 255, validation_split=0.25)

        train_generator = train_datagen.flow_from_directory(
            train_data_path,
            target_size=(self.img_height, self.img_width),
            color_mode="rgb",
            batch_size=self.batch_size,
            class_mode='categorical',
            subset='training')

        validation_generator = train_datagen.flow_from_directory(
            train_data_path,
            target_size=(self.img_height, self.img_width),
            color_mode="rgb",
            batch_size=self.batch_size,
            class_mode='categorical',
            subset='validation')

        # create the base pre-trained model
        base_model = ResNet50(weights='imagenet', include_top=False, input_shape=    (self.img_height, self.img_width, 3))

        # add a global spatial average pooling layer
        x = base_model.output
        x = GlobalAveragePooling2D()(x)
        # let's add a fully-connected layer
        x = Dense(1024, activation='relu')(x)
        #x = Dropout(0.3)(x)
        # and a logistic layer -- let's say we have 200 classes
        predictions = Dense(2, activation='softmax')(x)

        # this is the model we will train
        model = Model(inputs=base_model.input, outputs=predictions)

        # first: train only the top layers (which were randomly initialized)
        # i.e. freeze all convolutional InceptionV3 layers
        for layer in base_model.layers:
            layer.trainable = True

        # compile the model (should be done *after* setting layers to non-trainable)
        opt = Adam(self.lr)  # , decay=self.INIT_LR / self.NUM_EPOCHS)
        model.compile(opt, loss='binary_crossentropy', metrics=["accuracy"])

        # train the model on the new data for a few epochs
        from keras.callbacks import ModelCheckpoint, EarlyStopping
        import matplotlib.pyplot as plt

        checkpoint = ModelCheckpoint('resnetModel.h5', monitor='val_accuracy', verbose=1, save_best_only=True,
                                 save_weights_only=False, mode='auto', period=1)

        early = EarlyStopping(monitor='val_accuracy', min_delta=0, patience=16, verbose=1, mode='auto')
        hist = model.fit_generator(steps_per_epoch=self.batch_size, generator=train_generator,
                               validation_data=validation_generator, validation_steps=self.batch_size, epochs=self.epochs,
                               callbacks=[checkpoint, early])

        plt.plot(hist.history['accuracy'])
        plt.plot(hist.history['val_accuracy'])
        plt.plot(hist.history['loss'])
        plt.plot(hist.history['val_loss'])
        plt.title("model accuracy")
        plt.ylabel("Accuracy")
        plt.xlabel("Epoch")
        plt.legend(["Accuracy", "Validation Accuracy", "loss", "Validation Loss"])
        plt.show()

        plt.figure(1)

import tensorflow as tf

if __name__ == '__main__':
    x = ResNet()
    config = tf.compat.v1.ConfigProto()
    config.gpu_options.allow_growth = True
    sess = tf.compat.v1.Session(config=config)
    x.build_model("C:/Users/but/Desktop/dataScratch/Train")

And this is the training of the model这是model的训练

在此处输入图像描述

What could be the reason for resnet to fail but for vgg and inception to work?除了 vgg 和 inception 工作之外,resnet 失败的原因可能是什么? Do I have any mistakes in my script?我的脚本有错误吗?

At least for the code, I don't see any mistakes that might affect the training process.至少对于代码,我没有看到任何可能影响训练过程的错误。

# and a logistic layer -- let's say we have 200 classes
predictions = Dense(2, activation='softmax')(x)

Those lines are a bit suspicious.这些台词有点可疑。 But it seems that the typo is on the comment, so it should be okay.但是好像评论里有错别字,所以应该没问题。

# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
    layer.trainable = True

These are suspicious too.这些也很可疑。 If you want to freeze the ResNet-50's layers, what you need to do is如果你想冻结 ResNet-50 的层,你需要做的是

...
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(self.img_height, self.img_width, 3))
for layer in base_model.layers:
    layer.trainable = False
...

But it turned out that layer.trainable = True was actually your intention, so it wouldn't matter either.但事实证明layer.trainable = True实际上是你的意图,所以也没关系。

First of all, if you are using the same code which you used for training VGG16 and Inception V3, it is unlikely that the code is the problem.首先,如果您使用用于训练 VGG16 和 Inception V3 的相同代码,则代码不太可能是问题所在。

Why don't you check following susceptible reasons?为什么不检查以下易感原因?

  • The model may be too small/big that it underfits/overfits. model 可能太小/太大而无法适应/过度适应。 (Number of Parameters) (参数数量)
  • The model may need more time to converge. model 可能需要更多时间来收敛。 (Training for more epochs) (训练更多时期)
  • The ResNet may not be suited for this classification. ResNet 可能不适合这种分类。
  • Pretrained weights that you used may not be suited for this classification.您使用的预训练权重可能不适合此分类。
  • The learning rate may be too small/big.学习率可能太小/太大。
  • etc... ETC...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM