在 Keras fit_generator 中将 shuffle 设置为 True 时准确度降低

Question

The data I am working with is very imbalanced.我正在处理的数据非常不平衡。

I am training an image classifier using VGG16.我正在使用 VGG16 训练图像分类器。 I freezed all the layers in VGG16 accept the last two fully connected layers.我冻结了 VGG16 中的所有层，接受最后两个全连接层。

BATCH_SIZE = 128

EPOCHS = 80

When I set shuffle = False , the precision and recall for each class is very high (between .80-.90) but when I set shuffle = True , the precision and recall, for each class, drops to 0.10-0.20.当我设置shuffle = False 时，每个类的精度和召回率非常高（在 .80-.90 之间），但是当我设置shuffle = True 时，每个类的精度和召回率下降到0.10-0.20。 I am not sure what is going on.我不确定发生了什么。 Can some please help?有人可以帮忙吗？

Below is the code:下面是代码：

img_size = 224
trainGen = trainAug.flow_from_directory(
    trainPath,
    class_mode="categorical",
    target_size=(img_size, img_size),
    color_mode="rgb",
    shuffle=False,
    batch_size=BATCH_SIZE)
valGen = valAug.flow_from_directory(
    valPath,
    class_mode="categorical",
    target_size=(img_size, img_size),
    color_mode="rgb",
    shuffle=False,
    batch_size=BATCH_SIZE)

testGen = valAug.flow_from_directory(
    testPath,
    class_mode="categorical",
    target_size=(img_size, img_size),
    color_mode="rgb",
    shuffle=False,
    batch_size=BATCH_SIZE)

baseModel = VGG16(weights="imagenet", include_top=False,input_tensor=Input(shape=(img_size, img_size, 3)))
headModel = baseModel.output
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(512, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(PFR_NUM_CLASS, activation="softmax")(headModel)
# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)
# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
    layer.trainable = False

The class weights are calculated as:类权重计算如下：

from sklearn.utils import class_weight
import numpy as np

class_weights = class_weight.compute_class_weight(
               'balanced',
                np.unique(trainGen.classes), 
                trainGen.classes)

These are the class weights:这些是类权重：

array([0.18511007, 2.06740331, 1.00321716, 3.53018868, 2.48637874,
       2.27477204, 1.57557895, 6.68214286, 1.04233983, 4.02365591])

and code for training is:和训练代码是：

# compile our model (this needs to be done after our setting our layers to being non-trainable
print("[INFO] compiling model...")
opt = SGD(lr=1e-5, momentum=0.8)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# train the head of the network for a few epochs (all other layers
# are frozen) -- this will allow the new FC layers to start to become
#initialized with actual "learned" values versus pure random
print("[INFO] training head...")
H = model.fit_generator(
    trainGen,
    steps_per_epoch=totalTrain // BATCH_SIZE,
    validation_data=valGen,
    validation_steps=totalVal // BATCH_SIZE,
    epochs=EPOCHS,
    class_weight=class_weights,
    verbose=1,
    callbacks=callbacks_list)
# reset the testing generator and evaluate the network after
# fine-tuning just the network head

Answer 1

In your case, the problem with setting the shuffle=True is that if you shuffle on your validation set, the results will be chaotic.在您的情况下，设置shuffle=True的问题在于，如果您对验证集进行洗牌，结果将是混乱的。 It happens that the prediction is correct but compared to wrong indices can lead to misleading results, just like it happened in your case.碰巧预测是正确的，但与错误的指数相比可能会导致误导性结果，就像在您的情况下发生的那样。

Always shuffle=True on the training set and shuffle=False on the validation set and test set.在训练集上始终shuffle=True ，在验证集和测试集上始终shuffle=True shuffle=False 。

在 Keras fit_generator 中将 shuffle 设置为 True 时准确度降低

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-03-10 09:31:27

在 Keras fit_generator 中将 shuffle 设置为 True 时准确度降低

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-03-10 09:31:27

解决方案1
2 已采纳 2020-03-10 09:31:27