有偏数据集上的 VGG16 验证精度低

Question

i am new to machine learning i have a retinal image dataset of about 35K images from different 5 labels.我是机器学习的新手，我有一个视网膜图像数据集，其中包含来自不同 5 个标签的大约 35K 图像。

    Vgg16 model i used for training is `img_height, img_width = 224,224
    conv_base = vgg16.VGG16(weights='imagenet', include_top=False, pooling='max', input_shape = (img_width, img_height, 3))
    # check model layers are they trainable or not.
    for layer in conv_base.layers:
        layer.trainable=True
        print(layer, layer.trainable)
    model = models.Sequential()
    model.add(conv_base)
    model.add(layers.Dense(nb_categories, activation='softmax'))
    model.summary()
    # the no. imgaes to load at each iteration
    batch_size = 32
    # only rescaling
    train_datagen =  ImageDataGenerator(
        rescale=1./255
    )
    test_datagen =  ImageDataGenerator(
        rescale=1./255
    )
    # these are generators for train/test data that will read pictures #found in the defined subfolders of 'data/'
    print('Total number of images for "training":')
    train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size = (img_height, img_width),
    batch_size = batch_size, 
    class_mode = "categorical")
    print('Total number of images for "validation":')
    val_generator = test_datagen.flow_from_directory(
    val_data_dir,
    target_size = (img_height, img_width),
    batch_size = batch_size,
    class_mode = "categorical",
    shuffle=False)
    print('Total number of images for "testing":')
    test_generator = test_datagen.flow_from_directory(
    test_data_dir,
    target_size = (img_height, img_width),
    batch_size = batch_size,
    class_mode = "categorical",
    shuffle=False)
    learning_rate = 5e-5
    epochs = 25
    checkpoint = ModelCheckpoint("25_classifier.h5", monitor = 'val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
    model.compile(loss="categorical_crossentropy", optimizer=tensorflow.optimizers.Adam(lr=learning_rate, clipnorm = 1., epsilon =1e-8), metrics = ['acc'])
history = model.fit_generator(train_generator, 
                              epochs=epochs, 
                              shuffle=True, 
                              validation_data=val_generator,
                              steps_per_epoch=120,
                              callbacks=[checkpoint])

` this model gives accuracy is: ` 这个 model 给出的准确度是：

Epoch 1/25
  2/120 [..............................] - ETA: 1:31 - loss: 0.5271 - acc: 0.8281WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.2479s vs `on_train_batch_end` time: 0.6596s). Check your callbacks.
120/120 [==============================] - ETA: 0s - loss: 0.6356 - acc: 0.7914
Epoch 00001: val_acc improved from -inf to 0.77794, saving model to 25_classifier.h5
120/120 [==============================] - 167s 1s/step - loss: 0.6356 - acc: 0.7914 - val_loss: 0.6813 - val_acc: 0.7779
Epoch 2/25
120/120 [==============================] - ETA: 0s - loss: 0.6415 - acc: 0.7880
Epoch 00002: val_acc improved from 0.77794 to 0.78278, saving model to 25_classifier.h5
120/120 [==============================] - 194s 2s/step - loss: 0.6415 - acc: 0.7880 - val_loss: 0.6530 - val_acc: 0.7828
Epoch 3/25
120/120 [==============================] - ETA: 0s - loss: 0.6485 - acc: 0.7888
Epoch 00003: val_acc did not improve from 0.78278
120/120 [==============================] - 196s 2s/step - loss: 0.6485 - acc: 0.7888 - val_loss: 0.6473 - val_acc: 0.7796
Epoch 4/25
120/120 [==============================] - ETA: 0s - loss: 0.5914 - acc: 0.8073
Epoch 00004: val_acc did not improve from 0.78278
120/120 [==============================] - 197s 2s/step - loss: 0.5914 - acc: 0.8073 - val_loss: 0.6690 - val_acc: 0.7822
Epoch 5/25
120/120 [==============================] - ETA: 0s - loss: 0.5895 - acc: 0.8033
Epoch 00005: val_acc improved from 0.78278 to 0.78791, saving model to 25_classifier.h5
120/120 [==============================] - 198s 2s/step - loss: 0.5895 - acc: 0.8033 - val_loss: 0.6388 - val_acc: 0.7879
Epoch 6/25
120/120 [==============================] - ETA: 0s - loss: 0.6060 - acc: 0.7968
Epoch 00006: val_acc did not improve from 0.78791
120/120 [==============================] - 200s 2s/step - loss: 0.6060 - acc: 0.7968 - val_loss: 0.6338 - val_acc: 0.7873
Epoch 7/25
120/120 [==============================] - ETA: 0s - loss: 0.6043 - acc: 0.7964
Epoch 00007: val_acc did not improve from 0.78791
120/120 [==============================] - 198s 2s/step - loss: 0.6043 - acc: 0.7964 - val_loss: 0.6574 - val_acc: 0.7839
Epoch 8/25
120/120 [==============================] - ETA: 0s - loss: 0.6202 - acc: 0.7969
Epoch 00008: val_acc did not improve from 0.78791
120/120 [==============================] - 197s 2s/step - loss: 0.6202 - acc: 0.7969 - val_loss: 0.6812 - val_acc: 0.7785
Epoch 9/25
120/120 [==============================] - ETA: 0s - loss: 0.5965 - acc: 0.7990
Epoch 00009: val_acc improved from 0.78791 to 0.79247, saving model to 25_classifier.h5
120/120 [==============================] - 194s 2s/step - loss: 0.5965 - acc: 0.7990 - val_loss: 0.6404 - val_acc: 0.7925
Epoch 10/25
120/120 [==============================] - ETA: 0s - loss: 0.5999 - acc: 0.8010
Epoch 00010: val_acc did not improve from 0.79247
120/120 [==============================] - 195s 2s/step - loss: 0.5999 - acc: 0.8010 - val_loss: 0.6558 - val_acc: 0.7836
Epoch 11/25
120/120 [==============================] - ETA: 0s - loss: 0.5878 - acc: 0.8068
Epoch 00011: val_acc did not improve from 0.79247
120/120 [==============================] - 199s 2s/step - loss: 0.5878 - acc: 0.8068 - val_loss: 0.6601 - val_acc: 0.7842
Epoch 12/25
120/120 [==============================] - ETA: 0s - loss: 0.5592 - acc: 0.8104
Epoch 00012: val_acc did not improve from 0.79247
120/120 [==============================] - 200s 2s/step - loss: 0.5592 - acc: 0.8104 - val_loss: 0.6473 - val_acc: 0.7899
Epoch 13/25
120/120 [==============================] - ETA: 0s - loss: 0.5719 - acc: 0.8052
Epoch 00013: val_acc did not improve from 0.79247
120/120 [==============================] - 200s 2s/step - loss: 0.5719 - acc: 0.8052 - val_loss: 0.6539 - val_acc: 0.7802
Epoch 14/25
120/120 [==============================] - ETA: 0s - loss: 0.5697 - acc: 0.8104
Epoch 00014: val_acc did not improve from 0.79247
120/120 [==============================] - 196s 2s/step - loss: 0.5697 - acc: 0.8104 - val_loss: 0.6640 - val_acc: 0.7719
Epoch 15/25
120/120 [==============================] - ETA: 0s - loss: 0.5615 - acc: 0.8141
Epoch 00015: val_acc did not improve from 0.79247
120/120 [==============================] - 192s 2s/step - loss: 0.5615 - acc: 0.8141 - val_loss: 0.6762 - val_acc: 0.7680
Epoch 16/25
120/120 [==============================] - ETA: 0s - loss: 0.5502 - acc: 0.8148
Epoch 00016: val_acc did not improve from 0.79247
120/120 [==============================] - 195s 2s/step - loss: 0.5502 - acc: 0.8148 - val_loss: 0.6522 - val_acc: 0.7871
Epoch 17/25
120/120 [==============================] - ETA: 0s - loss: 0.5348 - acc: 0.8302
Epoch 00017: val_acc did not improve from 0.79247
120/120 [==============================] - 203s 2s/step - loss: 0.5348 - acc: 0.8302 - val_loss: 0.6682 - val_acc: 0.7885
Epoch 18/25
120/120 [==============================] - ETA: 0s - loss: 0.5709 - acc: 0.8115
Epoch 00018: val_acc improved from 0.79247 to 0.79647, saving model to 25_classifier.h5
120/120 [==============================] - 201s 2s/step - loss: 0.5709 - acc: 0.8115 - val_loss: 0.6203 - val_acc: 0.7965
Epoch 19/25
120/120 [==============================] - ETA: 0s - loss: 0.5061 - acc: 0.8380
Epoch 00019: val_acc did not improve from 0.79647
120/120 [==============================] - 200s 2s/step - loss: 0.5061 - acc: 0.8380 - val_loss: 0.7082 - val_acc: 0.7888
Epoch 20/25
120/120 [==============================] - ETA: 0s - loss: 0.5309 - acc: 0.8260
Epoch 00020: val_acc did not improve from 0.79647
120/120 [==============================] - 201s 2s/step - loss: 0.5309 - acc: 0.8260 - val_loss: 0.6347 - val_acc: 0.7868
Epoch 21/25
120/120 [==============================] - ETA: 0s - loss: 0.5303 - acc: 0.8271
Epoch 00021: val_acc did not improve from 0.79647
120/120 [==============================] - 199s 2s/step - loss: 0.5303 - acc: 0.8271 - val_loss: 0.6654 - val_acc: 0.7876
Epoch 22/25
120/120 [==============================] - ETA: 0s - loss: 0.5481 - acc: 0.8193
Epoch 00022: val_acc did not improve from 0.79647
120/120 [==============================] - 198s 2s/step - loss: 0.5481 - acc: 0.8193 - val_loss: 0.6677 - val_acc: 0.7737
Epoch 23/25
120/120 [==============================] - ETA: 0s - loss: 0.5360 - acc: 0.8198
Epoch 00023: val_acc did not improve from 0.79647
120/120 [==============================] - 202s 2s/step - loss: 0.5360 - acc: 0.8198 - val_loss: 0.6521 - val_acc: 0.7948
Epoch 24/25
120/120 [==============================] - ETA: 0s - loss: 0.4920 - acc: 0.8383
Epoch 00024: val_acc improved from 0.79647 to 0.79704, saving model to 25_classifier.h5
120/120 [==============================] - 200s 2s/step - loss: 0.4920 - acc: 0.8383 - val_loss: 0.6370 - val_acc: 0.7970
Epoch 25/25
120/120 [==============================] - ETA: 0s - loss: 0.5045 - acc: 0.8299
Epoch 00025: val_acc did not improve from 0.79704
120/120 [==============================] - 200s 2s/step - loss: 0.5045 - acc: 0.8299 - val_loss: 0.6357 - val_acc: 0.7916

the val loss not decreasing like wise val accuracy not increases. val 损失没有减少，同样 val 准确度不会增加。 i applied drouput layer but results are worst then i apply l1 and l2 regularization but it give anly 77.4% accuracy.我应用了 drouput 层，但结果最差，然后我应用 l1 和 l2 正则化，但它的准确率仅为 77.4%。 i want atleast 90% to 95% accuracy.我想要至少 90% 到 95% 的准确率。 Please help me i stuck badly.请帮助我，我卡得很厉害。

Answer 1

Imbalanced data sets are a common problem.不平衡的数据集是一个常见问题。 For example in your case if your model just predicts level 0 it will be right 20000/35000 percent of the time.例如，在您的情况下，如果您的 model 仅预测级别 0，那么 20000/35000% 的时间是正确的。 There are a number of ways to deal with the problem.有很多方法可以解决这个问题。 Most obvious of course is to find more samples for the under represented classes.最明显的当然是为代表性不足的类找到更多样本。 Unfortunately that is often not feasible.不幸的是，这通常是不可行的。 Next thing to try is 'under-sampling'.接下来要尝试的是“欠采样”。 In that case since you have a reasonably large data set, remove some percentage of images from the over represented class.在这种情况下，由于您拥有相当大的数据集，请从过度表示的 class 中删除一定百分比的图像。 In this case say remove 50% of the images from the level 0 directory.在这种情况下，假设从 0 级目录中删除 50% 的图像。 Another method is to 'over=sample' the unrepresented class.另一种方法是“过度采样”未表示的 class。 In this case you try to create 'augmented' images for the under represented classes.You can use an image processing module, for example cv2 to create augments images and store them in the directories for the under represented classes.在这种情况下，您尝试为未充分表示的类创建“增强”图像。您可以使用图像处理模块，例如 cv2 创建增强图像并将它们存储在未充分表示的类的目录中。 Things you can do for example is horizontally flip the images, change brightness etc. Tensorflow has a parameter in model.fit called class_weight to help deal with the imbalance.例如，您可以做的事情是水平翻转图像、更改亮度等。Tensorflow 在 model.fit 中有一个名为 class_weight 的参数来帮助处理不平衡。 Documentation is here.文档在这里。 What it does is to give more weight to under represented classes in calculation of the loss function.它的作用是在计算损失 function 时给予代表性不足的类别更多的权重。 Below is the code for a function I developed to determine the class weights.下面是我为确定 class 权重而开发的 function 的代码。 Set the parameter dir to point to your training directory.将参数 dir 设置为指向您的培训目录。

def get_weight_dict(dir):    
    most_samples=0
    class_weight={}
    class_list=os.listdir(dir) # dir is the directory with the training samples organized by class    
    for c in (class_list): # iterate through class directories, find number of samples in each class then find class with highest number of samples
        c_path=os.path.join(dir,c)
        if os.path.isdir(c_path):            
            length=len(os.listdir(c_path)) # determine number of samples in the class directory
            if length>most_samples:
                most_samples=length   
    for i,c in enumerate(class_list): #iterate through class directories, find number of samples in each and divide total_samples by length
        c_path=os.path.join(dir,c)
        if os.path.isdir(c_path):
            length=len(os.listdir(c_path)) # number of samples inclass directory
            class_weight[i]=most_samples/length   
            #print (i,most_samples, class_weight[i])   
    return class_weight

Then in model.fit set the call_weight parameter to the result of the function.然后在 model.fit 中将 call_weight 参数设置为 function 的结果。 Frankly I have not found this to be very effective.坦率地说，我还没有发现这非常有效。 The other thing I notice you are using ModelCheckpoint.我注意到您正在使用 ModelCheckpoint 的另一件事。 It saves the model with the lowest validation loss to a location on disk.它将验证损失最低的 model 保存到磁盘上的某个位置。 However to use it you must load the saved model and THEN do predictions.但是要使用它，您必须加载保存的 model 然后进行预测。 I think you would be better off is you use the callback EarlyStopping because if you set its parameter restore_best_weights=True you wont have to load the model.我认为你最好使用回调 EarlyStopping，因为如果你设置它的参数 restore_best_weights=True 你就不必加载 model。 Just set the number of epochs high enough so that this callback activates.只需将 epoch 的数量设置得足够高，以便激活此回调。 Documentation for callbacks is here.回调的文档在这里。 I also recommend you consider using an additional callback ReduceLROnPlateau.我还建议您考虑使用额外的回调 ReduceLROnPlateau。 This callback reduces the learning rate automatically if the value being monitored does not improve.如果被监控的值没有提高，这个回调会自动降低学习率。 My recommended code for these callbacks is shown below我为这些回调推荐的代码如下所示

rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",factor=0.5,patience=1,
                                            verbose=1)
es=tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=3, verbose=1,
                                   restore_best_weights=True)
callbacks=[rlronp, es]

One final comment.最后一条评论。 I do not like VGG models because they have on the order of 40 million trainable parameters so they have a heavy computational cost and cause longer training time.我不喜欢 VGG 模型，因为它们有大约 4000 万个可训练参数，因此它们的计算成本很高，并且会导致更长的训练时间。 I use the MobilenetV2 model which has about 4 million trainable parameters and is about as accurate.我使用 MobilenetV2 model，它有大约 400 万个可训练参数，并且准确度差不多。 You use it just like you used the VGG model.您可以像使用 VGG model 一样使用它。

有偏数据集上的 VGG16 验证精度低

问题描述

1 个解决方案

解决方案1
0 2021-02-28 16:24:49

有偏数据集上的 VGG16 验证精度低

问题描述

1 个解决方案

解决方案1 0 2021-02-28 16:24:49

解决方案1
0 2021-02-28 16:24:49