有偏数据集上的 VGG16 验证精度低

Question

我是机器学习的新手，我有一个视网膜图像数据集，其中包含来自不同 5 个标签的大约 35K 图像。

    Vgg16 model i used for training is `img_height, img_width = 224,224
    conv_base = vgg16.VGG16(weights='imagenet', include_top=False, pooling='max', input_shape = (img_width, img_height, 3))
    # check model layers are they trainable or not.
    for layer in conv_base.layers:
        layer.trainable=True
        print(layer, layer.trainable)
    model = models.Sequential()
    model.add(conv_base)
    model.add(layers.Dense(nb_categories, activation='softmax'))
    model.summary()
    # the no. imgaes to load at each iteration
    batch_size = 32
    # only rescaling
    train_datagen =  ImageDataGenerator(
        rescale=1./255
    )
    test_datagen =  ImageDataGenerator(
        rescale=1./255
    )
    # these are generators for train/test data that will read pictures #found in the defined subfolders of 'data/'
    print('Total number of images for "training":')
    train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size = (img_height, img_width),
    batch_size = batch_size, 
    class_mode = "categorical")
    print('Total number of images for "validation":')
    val_generator = test_datagen.flow_from_directory(
    val_data_dir,
    target_size = (img_height, img_width),
    batch_size = batch_size,
    class_mode = "categorical",
    shuffle=False)
    print('Total number of images for "testing":')
    test_generator = test_datagen.flow_from_directory(
    test_data_dir,
    target_size = (img_height, img_width),
    batch_size = batch_size,
    class_mode = "categorical",
    shuffle=False)
    learning_rate = 5e-5
    epochs = 25
    checkpoint = ModelCheckpoint("25_classifier.h5", monitor = 'val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
    model.compile(loss="categorical_crossentropy", optimizer=tensorflow.optimizers.Adam(lr=learning_rate, clipnorm = 1., epsilon =1e-8), metrics = ['acc'])
history = model.fit_generator(train_generator, 
                              epochs=epochs, 
                              shuffle=True, 
                              validation_data=val_generator,
                              steps_per_epoch=120,
                              callbacks=[checkpoint])

` 这个 model 给出的准确度是：

Epoch 1/25
  2/120 [..............................] - ETA: 1:31 - loss: 0.5271 - acc: 0.8281WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.2479s vs `on_train_batch_end` time: 0.6596s). Check your callbacks.
120/120 [==============================] - ETA: 0s - loss: 0.6356 - acc: 0.7914
Epoch 00001: val_acc improved from -inf to 0.77794, saving model to 25_classifier.h5
120/120 [==============================] - 167s 1s/step - loss: 0.6356 - acc: 0.7914 - val_loss: 0.6813 - val_acc: 0.7779
Epoch 2/25
120/120 [==============================] - ETA: 0s - loss: 0.6415 - acc: 0.7880
Epoch 00002: val_acc improved from 0.77794 to 0.78278, saving model to 25_classifier.h5
120/120 [==============================] - 194s 2s/step - loss: 0.6415 - acc: 0.7880 - val_loss: 0.6530 - val_acc: 0.7828
Epoch 3/25
120/120 [==============================] - ETA: 0s - loss: 0.6485 - acc: 0.7888
Epoch 00003: val_acc did not improve from 0.78278
120/120 [==============================] - 196s 2s/step - loss: 0.6485 - acc: 0.7888 - val_loss: 0.6473 - val_acc: 0.7796
Epoch 4/25
120/120 [==============================] - ETA: 0s - loss: 0.5914 - acc: 0.8073
Epoch 00004: val_acc did not improve from 0.78278
120/120 [==============================] - 197s 2s/step - loss: 0.5914 - acc: 0.8073 - val_loss: 0.6690 - val_acc: 0.7822
Epoch 5/25
120/120 [==============================] - ETA: 0s - loss: 0.5895 - acc: 0.8033
Epoch 00005: val_acc improved from 0.78278 to 0.78791, saving model to 25_classifier.h5
120/120 [==============================] - 198s 2s/step - loss: 0.5895 - acc: 0.8033 - val_loss: 0.6388 - val_acc: 0.7879
Epoch 6/25
120/120 [==============================] - ETA: 0s - loss: 0.6060 - acc: 0.7968
Epoch 00006: val_acc did not improve from 0.78791
120/120 [==============================] - 200s 2s/step - loss: 0.6060 - acc: 0.7968 - val_loss: 0.6338 - val_acc: 0.7873
Epoch 7/25
120/120 [==============================] - ETA: 0s - loss: 0.6043 - acc: 0.7964
Epoch 00007: val_acc did not improve from 0.78791
120/120 [==============================] - 198s 2s/step - loss: 0.6043 - acc: 0.7964 - val_loss: 0.6574 - val_acc: 0.7839
Epoch 8/25
120/120 [==============================] - ETA: 0s - loss: 0.6202 - acc: 0.7969
Epoch 00008: val_acc did not improve from 0.78791
120/120 [==============================] - 197s 2s/step - loss: 0.6202 - acc: 0.7969 - val_loss: 0.6812 - val_acc: 0.7785
Epoch 9/25
120/120 [==============================] - ETA: 0s - loss: 0.5965 - acc: 0.7990
Epoch 00009: val_acc improved from 0.78791 to 0.79247, saving model to 25_classifier.h5
120/120 [==============================] - 194s 2s/step - loss: 0.5965 - acc: 0.7990 - val_loss: 0.6404 - val_acc: 0.7925
Epoch 10/25
120/120 [==============================] - ETA: 0s - loss: 0.5999 - acc: 0.8010
Epoch 00010: val_acc did not improve from 0.79247
120/120 [==============================] - 195s 2s/step - loss: 0.5999 - acc: 0.8010 - val_loss: 0.6558 - val_acc: 0.7836
Epoch 11/25
120/120 [==============================] - ETA: 0s - loss: 0.5878 - acc: 0.8068
Epoch 00011: val_acc did not improve from 0.79247
120/120 [==============================] - 199s 2s/step - loss: 0.5878 - acc: 0.8068 - val_loss: 0.6601 - val_acc: 0.7842
Epoch 12/25
120/120 [==============================] - ETA: 0s - loss: 0.5592 - acc: 0.8104
Epoch 00012: val_acc did not improve from 0.79247
120/120 [==============================] - 200s 2s/step - loss: 0.5592 - acc: 0.8104 - val_loss: 0.6473 - val_acc: 0.7899
Epoch 13/25
120/120 [==============================] - ETA: 0s - loss: 0.5719 - acc: 0.8052
Epoch 00013: val_acc did not improve from 0.79247
120/120 [==============================] - 200s 2s/step - loss: 0.5719 - acc: 0.8052 - val_loss: 0.6539 - val_acc: 0.7802
Epoch 14/25
120/120 [==============================] - ETA: 0s - loss: 0.5697 - acc: 0.8104
Epoch 00014: val_acc did not improve from 0.79247
120/120 [==============================] - 196s 2s/step - loss: 0.5697 - acc: 0.8104 - val_loss: 0.6640 - val_acc: 0.7719
Epoch 15/25
120/120 [==============================] - ETA: 0s - loss: 0.5615 - acc: 0.8141
Epoch 00015: val_acc did not improve from 0.79247
120/120 [==============================] - 192s 2s/step - loss: 0.5615 - acc: 0.8141 - val_loss: 0.6762 - val_acc: 0.7680
Epoch 16/25
120/120 [==============================] - ETA: 0s - loss: 0.5502 - acc: 0.8148
Epoch 00016: val_acc did not improve from 0.79247
120/120 [==============================] - 195s 2s/step - loss: 0.5502 - acc: 0.8148 - val_loss: 0.6522 - val_acc: 0.7871
Epoch 17/25
120/120 [==============================] - ETA: 0s - loss: 0.5348 - acc: 0.8302
Epoch 00017: val_acc did not improve from 0.79247
120/120 [==============================] - 203s 2s/step - loss: 0.5348 - acc: 0.8302 - val_loss: 0.6682 - val_acc: 0.7885
Epoch 18/25
120/120 [==============================] - ETA: 0s - loss: 0.5709 - acc: 0.8115
Epoch 00018: val_acc improved from 0.79247 to 0.79647, saving model to 25_classifier.h5
120/120 [==============================] - 201s 2s/step - loss: 0.5709 - acc: 0.8115 - val_loss: 0.6203 - val_acc: 0.7965
Epoch 19/25
120/120 [==============================] - ETA: 0s - loss: 0.5061 - acc: 0.8380
Epoch 00019: val_acc did not improve from 0.79647
120/120 [==============================] - 200s 2s/step - loss: 0.5061 - acc: 0.8380 - val_loss: 0.7082 - val_acc: 0.7888
Epoch 20/25
120/120 [==============================] - ETA: 0s - loss: 0.5309 - acc: 0.8260
Epoch 00020: val_acc did not improve from 0.79647
120/120 [==============================] - 201s 2s/step - loss: 0.5309 - acc: 0.8260 - val_loss: 0.6347 - val_acc: 0.7868
Epoch 21/25
120/120 [==============================] - ETA: 0s - loss: 0.5303 - acc: 0.8271
Epoch 00021: val_acc did not improve from 0.79647
120/120 [==============================] - 199s 2s/step - loss: 0.5303 - acc: 0.8271 - val_loss: 0.6654 - val_acc: 0.7876
Epoch 22/25
120/120 [==============================] - ETA: 0s - loss: 0.5481 - acc: 0.8193
Epoch 00022: val_acc did not improve from 0.79647
120/120 [==============================] - 198s 2s/step - loss: 0.5481 - acc: 0.8193 - val_loss: 0.6677 - val_acc: 0.7737
Epoch 23/25
120/120 [==============================] - ETA: 0s - loss: 0.5360 - acc: 0.8198
Epoch 00023: val_acc did not improve from 0.79647
120/120 [==============================] - 202s 2s/step - loss: 0.5360 - acc: 0.8198 - val_loss: 0.6521 - val_acc: 0.7948
Epoch 24/25
120/120 [==============================] - ETA: 0s - loss: 0.4920 - acc: 0.8383
Epoch 00024: val_acc improved from 0.79647 to 0.79704, saving model to 25_classifier.h5
120/120 [==============================] - 200s 2s/step - loss: 0.4920 - acc: 0.8383 - val_loss: 0.6370 - val_acc: 0.7970
Epoch 25/25
120/120 [==============================] - ETA: 0s - loss: 0.5045 - acc: 0.8299
Epoch 00025: val_acc did not improve from 0.79704
120/120 [==============================] - 200s 2s/step - loss: 0.5045 - acc: 0.8299 - val_loss: 0.6357 - val_acc: 0.7916

val 损失没有减少，同样 val 准确度不会增加。 我应用了 drouput 层，但结果最差，然后我应用 l1 和 l2 正则化，但它的准确率仅为 77.4%。 我想要至少 90% 到 95% 的准确率。 请帮助我，我卡得很厉害。

Answer 1

不平衡的数据集是一个常见问题。 例如，在您的情况下，如果您的 model 仅预测级别 0，那么 20000/35000% 的时间是正确的。 有很多方法可以解决这个问题。 最明显的当然是为代表性不足的类找到更多样本。 不幸的是，这通常是不可行的。 接下来要尝试的是“欠采样”。 在这种情况下，由于您拥有相当大的数据集，请从过度表示的 class 中删除一定百分比的图像。 在这种情况下，假设从 0 级目录中删除 50% 的图像。 另一种方法是“过度采样”未表示的 class。 在这种情况下，您尝试为未充分表示的类创建“增强”图像。您可以使用图像处理模块，例如 cv2 创建增强图像并将它们存储在未充分表示的类的目录中。 例如，您可以做的事情是水平翻转图像、更改亮度等。Tensorflow 在 model.fit 中有一个名为 class_weight 的参数来帮助处理不平衡。 文档在这里。 它的作用是在计算损失 function 时给予代表性不足的类别更多的权重。 下面是我为确定 class 权重而开发的 function 的代码。 将参数 dir 设置为指向您的培训目录。

def get_weight_dict(dir):    
    most_samples=0
    class_weight={}
    class_list=os.listdir(dir) # dir is the directory with the training samples organized by class    
    for c in (class_list): # iterate through class directories, find number of samples in each class then find class with highest number of samples
        c_path=os.path.join(dir,c)
        if os.path.isdir(c_path):            
            length=len(os.listdir(c_path)) # determine number of samples in the class directory
            if length>most_samples:
                most_samples=length   
    for i,c in enumerate(class_list): #iterate through class directories, find number of samples in each and divide total_samples by length
        c_path=os.path.join(dir,c)
        if os.path.isdir(c_path):
            length=len(os.listdir(c_path)) # number of samples inclass directory
            class_weight[i]=most_samples/length   
            #print (i,most_samples, class_weight[i])   
    return class_weight

然后在 model.fit 中将 call_weight 参数设置为 function 的结果。 坦率地说，我还没有发现这非常有效。 我注意到您正在使用 ModelCheckpoint 的另一件事。 它将验证损失最低的 model 保存到磁盘上的某个位置。 但是要使用它，您必须加载保存的 model 然后进行预测。 我认为你最好使用回调 EarlyStopping，因为如果你设置它的参数 restore_best_weights=True 你就不必加载 model。 只需将 epoch 的数量设置得足够高，以便激活此回调。 回调的文档在这里。 我还建议您考虑使用额外的回调 ReduceLROnPlateau。 如果被监控的值没有提高，这个回调会自动降低学习率。 我为这些回调推荐的代码如下所示

rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",factor=0.5,patience=1,
                                            verbose=1)
es=tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=3, verbose=1,
                                   restore_best_weights=True)
callbacks=[rlronp, es]

最后一条评论。 我不喜欢 VGG 模型，因为它们有大约 4000 万个可训练参数，因此它们的计算成本很高，并且会导致更长的训练时间。 我使用 MobilenetV2 model，它有大约 400 万个可训练参数，并且准确度差不多。 您可以像使用 VGG model 一样使用它。

有偏数据集上的 VGG16 验证精度低

问题描述

1 个解决方案

解决方案1
0 2021-02-28 16:24:49

有偏数据集上的 VGG16 验证精度低

问题描述

1 个解决方案

解决方案1 0 2021-02-28 16:24:49

解决方案1
0 2021-02-28 16:24:49