繁体   English   中英

有偏数据集上的 VGG16 验证精度低

[英]low VGG16 validation accuracy on biased dataset

我是机器学习的新手,我有一个视网膜图像数据集,其中包含来自不同 5 个标签的大约 35K 图像。 它的有偏见的数据集没有。训练、测试和验证的图像数量为: “训练”的图像总数:找到属于 5 个类别的 28084 张图像。 “验证”的图像总数:找到属于 5 个类别的 3508 个图像。 “测试”的图像总数:找到属于 5 个类别的 3516 个图像。

    Vgg16 model i used for training is `img_height, img_width = 224,224
    conv_base = vgg16.VGG16(weights='imagenet', include_top=False, pooling='max', input_shape = (img_width, img_height, 3))
    # check model layers are they trainable or not.
    for layer in conv_base.layers:
        layer.trainable=True
        print(layer, layer.trainable)
    model = models.Sequential()
    model.add(conv_base)
    model.add(layers.Dense(nb_categories, activation='softmax'))
    model.summary()
    # the no. imgaes to load at each iteration
    batch_size = 32
    # only rescaling
    train_datagen =  ImageDataGenerator(
        rescale=1./255
    )
    test_datagen =  ImageDataGenerator(
        rescale=1./255
    )
    # these are generators for train/test data that will read pictures #found in the defined subfolders of 'data/'
    print('Total number of images for "training":')
    train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size = (img_height, img_width),
    batch_size = batch_size, 
    class_mode = "categorical")
    print('Total number of images for "validation":')
    val_generator = test_datagen.flow_from_directory(
    val_data_dir,
    target_size = (img_height, img_width),
    batch_size = batch_size,
    class_mode = "categorical",
    shuffle=False)
    print('Total number of images for "testing":')
    test_generator = test_datagen.flow_from_directory(
    test_data_dir,
    target_size = (img_height, img_width),
    batch_size = batch_size,
    class_mode = "categorical",
    shuffle=False)
    learning_rate = 5e-5
    epochs = 25
    checkpoint = ModelCheckpoint("25_classifier.h5", monitor = 'val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
    model.compile(loss="categorical_crossentropy", optimizer=tensorflow.optimizers.Adam(lr=learning_rate, clipnorm = 1., epsilon =1e-8), metrics = ['acc'])
history = model.fit_generator(train_generator, 
                              epochs=epochs, 
                              shuffle=True, 
                              validation_data=val_generator,
                              steps_per_epoch=120,
                              callbacks=[checkpoint])

` 这个 model 给出的准确度是:

Epoch 1/25
  2/120 [..............................] - ETA: 1:31 - loss: 0.5271 - acc: 0.8281WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.2479s vs `on_train_batch_end` time: 0.6596s). Check your callbacks.
120/120 [==============================] - ETA: 0s - loss: 0.6356 - acc: 0.7914
Epoch 00001: val_acc improved from -inf to 0.77794, saving model to 25_classifier.h5
120/120 [==============================] - 167s 1s/step - loss: 0.6356 - acc: 0.7914 - val_loss: 0.6813 - val_acc: 0.7779
Epoch 2/25
120/120 [==============================] - ETA: 0s - loss: 0.6415 - acc: 0.7880
Epoch 00002: val_acc improved from 0.77794 to 0.78278, saving model to 25_classifier.h5
120/120 [==============================] - 194s 2s/step - loss: 0.6415 - acc: 0.7880 - val_loss: 0.6530 - val_acc: 0.7828
Epoch 3/25
120/120 [==============================] - ETA: 0s - loss: 0.6485 - acc: 0.7888
Epoch 00003: val_acc did not improve from 0.78278
120/120 [==============================] - 196s 2s/step - loss: 0.6485 - acc: 0.7888 - val_loss: 0.6473 - val_acc: 0.7796
Epoch 4/25
120/120 [==============================] - ETA: 0s - loss: 0.5914 - acc: 0.8073
Epoch 00004: val_acc did not improve from 0.78278
120/120 [==============================] - 197s 2s/step - loss: 0.5914 - acc: 0.8073 - val_loss: 0.6690 - val_acc: 0.7822
Epoch 5/25
120/120 [==============================] - ETA: 0s - loss: 0.5895 - acc: 0.8033
Epoch 00005: val_acc improved from 0.78278 to 0.78791, saving model to 25_classifier.h5
120/120 [==============================] - 198s 2s/step - loss: 0.5895 - acc: 0.8033 - val_loss: 0.6388 - val_acc: 0.7879
Epoch 6/25
120/120 [==============================] - ETA: 0s - loss: 0.6060 - acc: 0.7968
Epoch 00006: val_acc did not improve from 0.78791
120/120 [==============================] - 200s 2s/step - loss: 0.6060 - acc: 0.7968 - val_loss: 0.6338 - val_acc: 0.7873
Epoch 7/25
120/120 [==============================] - ETA: 0s - loss: 0.6043 - acc: 0.7964
Epoch 00007: val_acc did not improve from 0.78791
120/120 [==============================] - 198s 2s/step - loss: 0.6043 - acc: 0.7964 - val_loss: 0.6574 - val_acc: 0.7839
Epoch 8/25
120/120 [==============================] - ETA: 0s - loss: 0.6202 - acc: 0.7969
Epoch 00008: val_acc did not improve from 0.78791
120/120 [==============================] - 197s 2s/step - loss: 0.6202 - acc: 0.7969 - val_loss: 0.6812 - val_acc: 0.7785
Epoch 9/25
120/120 [==============================] - ETA: 0s - loss: 0.5965 - acc: 0.7990
Epoch 00009: val_acc improved from 0.78791 to 0.79247, saving model to 25_classifier.h5
120/120 [==============================] - 194s 2s/step - loss: 0.5965 - acc: 0.7990 - val_loss: 0.6404 - val_acc: 0.7925
Epoch 10/25
120/120 [==============================] - ETA: 0s - loss: 0.5999 - acc: 0.8010
Epoch 00010: val_acc did not improve from 0.79247
120/120 [==============================] - 195s 2s/step - loss: 0.5999 - acc: 0.8010 - val_loss: 0.6558 - val_acc: 0.7836
Epoch 11/25
120/120 [==============================] - ETA: 0s - loss: 0.5878 - acc: 0.8068
Epoch 00011: val_acc did not improve from 0.79247
120/120 [==============================] - 199s 2s/step - loss: 0.5878 - acc: 0.8068 - val_loss: 0.6601 - val_acc: 0.7842
Epoch 12/25
120/120 [==============================] - ETA: 0s - loss: 0.5592 - acc: 0.8104
Epoch 00012: val_acc did not improve from 0.79247
120/120 [==============================] - 200s 2s/step - loss: 0.5592 - acc: 0.8104 - val_loss: 0.6473 - val_acc: 0.7899
Epoch 13/25
120/120 [==============================] - ETA: 0s - loss: 0.5719 - acc: 0.8052
Epoch 00013: val_acc did not improve from 0.79247
120/120 [==============================] - 200s 2s/step - loss: 0.5719 - acc: 0.8052 - val_loss: 0.6539 - val_acc: 0.7802
Epoch 14/25
120/120 [==============================] - ETA: 0s - loss: 0.5697 - acc: 0.8104
Epoch 00014: val_acc did not improve from 0.79247
120/120 [==============================] - 196s 2s/step - loss: 0.5697 - acc: 0.8104 - val_loss: 0.6640 - val_acc: 0.7719
Epoch 15/25
120/120 [==============================] - ETA: 0s - loss: 0.5615 - acc: 0.8141
Epoch 00015: val_acc did not improve from 0.79247
120/120 [==============================] - 192s 2s/step - loss: 0.5615 - acc: 0.8141 - val_loss: 0.6762 - val_acc: 0.7680
Epoch 16/25
120/120 [==============================] - ETA: 0s - loss: 0.5502 - acc: 0.8148
Epoch 00016: val_acc did not improve from 0.79247
120/120 [==============================] - 195s 2s/step - loss: 0.5502 - acc: 0.8148 - val_loss: 0.6522 - val_acc: 0.7871
Epoch 17/25
120/120 [==============================] - ETA: 0s - loss: 0.5348 - acc: 0.8302
Epoch 00017: val_acc did not improve from 0.79247
120/120 [==============================] - 203s 2s/step - loss: 0.5348 - acc: 0.8302 - val_loss: 0.6682 - val_acc: 0.7885
Epoch 18/25
120/120 [==============================] - ETA: 0s - loss: 0.5709 - acc: 0.8115
Epoch 00018: val_acc improved from 0.79247 to 0.79647, saving model to 25_classifier.h5
120/120 [==============================] - 201s 2s/step - loss: 0.5709 - acc: 0.8115 - val_loss: 0.6203 - val_acc: 0.7965
Epoch 19/25
120/120 [==============================] - ETA: 0s - loss: 0.5061 - acc: 0.8380
Epoch 00019: val_acc did not improve from 0.79647
120/120 [==============================] - 200s 2s/step - loss: 0.5061 - acc: 0.8380 - val_loss: 0.7082 - val_acc: 0.7888
Epoch 20/25
120/120 [==============================] - ETA: 0s - loss: 0.5309 - acc: 0.8260
Epoch 00020: val_acc did not improve from 0.79647
120/120 [==============================] - 201s 2s/step - loss: 0.5309 - acc: 0.8260 - val_loss: 0.6347 - val_acc: 0.7868
Epoch 21/25
120/120 [==============================] - ETA: 0s - loss: 0.5303 - acc: 0.8271
Epoch 00021: val_acc did not improve from 0.79647
120/120 [==============================] - 199s 2s/step - loss: 0.5303 - acc: 0.8271 - val_loss: 0.6654 - val_acc: 0.7876
Epoch 22/25
120/120 [==============================] - ETA: 0s - loss: 0.5481 - acc: 0.8193
Epoch 00022: val_acc did not improve from 0.79647
120/120 [==============================] - 198s 2s/step - loss: 0.5481 - acc: 0.8193 - val_loss: 0.6677 - val_acc: 0.7737
Epoch 23/25
120/120 [==============================] - ETA: 0s - loss: 0.5360 - acc: 0.8198
Epoch 00023: val_acc did not improve from 0.79647
120/120 [==============================] - 202s 2s/step - loss: 0.5360 - acc: 0.8198 - val_loss: 0.6521 - val_acc: 0.7948
Epoch 24/25
120/120 [==============================] - ETA: 0s - loss: 0.4920 - acc: 0.8383
Epoch 00024: val_acc improved from 0.79647 to 0.79704, saving model to 25_classifier.h5
120/120 [==============================] - 200s 2s/step - loss: 0.4920 - acc: 0.8383 - val_loss: 0.6370 - val_acc: 0.7970
Epoch 25/25
120/120 [==============================] - ETA: 0s - loss: 0.5045 - acc: 0.8299
Epoch 00025: val_acc did not improve from 0.79704
120/120 [==============================] - 200s 2s/step - loss: 0.5045 - acc: 0.8299 - val_loss: 0.6357 - val_acc: 0.7916

val 损失没有减少,同样 val 准确度不会增加。 我应用了 drouput 层,但结果最差,然后我应用 l1 和 l2 正则化,但它的准确率仅为 77.4%。 我想要至少 90% 到 95% 的准确率。 请帮助我,我卡得很厉害。

不平衡的数据集是一个常见问题。 例如,在您的情况下,如果您的 model 仅预测级别 0,那么 20000/35000% 的时间是正确的。 有很多方法可以解决这个问题。 最明显的当然是为代表性不足的类找到更多样本。 不幸的是,这通常是不可行的。 接下来要尝试的是“欠采样”。 在这种情况下,由于您拥有相当大的数据集,请从过度表示的 class 中删除一定百分比的图像。 在这种情况下,假设从 0 级目录中删除 50% 的图像。 另一种方法是“过度采样”未表示的 class。 在这种情况下,您尝试为未充分表示的类创建“增强”图像。您可以使用图像处理模块,例如 cv2 创建增强图像并将它们存储在未充分表示的类的目录中。 例如,您可以做的事情是水平翻转图像、更改亮度等。Tensorflow 在 model.fit 中有一个名为 class_weight 的参数来帮助处理不平衡。 文档在这里。 它的作用是在计算损失 function 时给予代表性不足的类别更多的权重。 下面是我为确定 class 权重而开发的 function 的代码。 将参数 dir 设置为指向您的培训目录。

def get_weight_dict(dir):    
    most_samples=0
    class_weight={}
    class_list=os.listdir(dir) # dir is the directory with the training samples organized by class    
    for c in (class_list): # iterate through class directories, find number of samples in each class then find class with highest number of samples
        c_path=os.path.join(dir,c)
        if os.path.isdir(c_path):            
            length=len(os.listdir(c_path)) # determine number of samples in the class directory
            if length>most_samples:
                most_samples=length   
    for i,c in enumerate(class_list): #iterate through class directories, find number of samples in each and divide total_samples by length
        c_path=os.path.join(dir,c)
        if os.path.isdir(c_path):
            length=len(os.listdir(c_path)) # number of samples inclass directory
            class_weight[i]=most_samples/length   
            #print (i,most_samples, class_weight[i])   
    return class_weight

然后在 model.fit 中将 call_weight 参数设置为 function 的结果。 坦率地说,我还没有发现这非常有效。 我注意到您正在使用 ModelCheckpoint 的另一件事。 它将验证损失最低的 model 保存到磁盘上的某个位置。 但是要使用它,您必须加载保存的 model 然后进行预测。 我认为你最好使用回调 EarlyStopping,因为如果你设置它的参数 restore_best_weights=True 你就不必加载 model。 只需将 epoch 的数量设置得足够高,以便激活此回调。 回调的文档在这里。 我还建议您考虑使用额外的回调 ReduceLROnPlateau。 如果被监控的值没有提高,这个回调会自动降低学习率。 我为这些回调推荐的代码如下所示

rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",factor=0.5,patience=1,
                                            verbose=1)
es=tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=3, verbose=1,
                                   restore_best_weights=True)
callbacks=[rlronp, es]

最后一条评论。 我不喜欢 VGG 模型,因为它们有大约 4000 万个可训练参数,因此它们的计算成本很高,并且会导致更长的训练时间。 我使用 MobilenetV2 model,它有大约 400 万个可训练参数,并且准确度差不多。 您可以像使用 VGG model 一样使用它。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM