简体   繁体   English

使用 keras ImageDataGenerator flow_from_dataframe 时,验证集仅从一类获取图像

[英]Validation set only gets images from one class when using keras ImageDataGenerator flow_from_dataframe

I have a list of images along with the class it belongs to in this format:我有一个图像列表以及它属于这种格式的类:

list.txt列表.txt

image1 good
image2 good
image3 good
.
.
.
image4 bad
image5 bad
image6 bad

I used the ImageDataGenerator to split validation data:我使用 ImageDataGenerator 来拆分验证数据:

train_datagen = ImageDataGenerator(rescale=1./255, validation_split = 0.25)

I used pandas to read from file make dataframe:我使用熊猫从文件 make 数据帧中读取:

load_images = pd.read_csv("list.txt", delim_whitespace = True, header = None)
load_images.columns = ['filename','class']
load_images.columns = load_images.columns.str.strip()

trainDataframe = load_images    

I used flow_from_dataframe to create train and validation generators:我使用 flow_from_dataframe 创建训练和验证生成器:

train_generator = train_datagen.flow_from_dataframe(
        trainDataFrame,
        x_col = 'filename',
        y_col = 'class',
        directory = path_to_parent_folder_of_images,
        target_size=(inputHeight, inputWidth),
        batch_size=batch_size,
        class_mode='categorical',
        subset = 'training',
        save_to_dir = "path_to_folder\\training",
        shuffle = True)

validation_generator = train_datagen.flow_from_dataframe(
        trainDataFrame,
        x_col = 'filename',
        y_col = 'class',
        directory = path_to_parent_folder_of_images,
        target_size=(inputHeight, inputWidth),
        batch_size=batch_size,
        class_mode='categorical',
        subset= 'validation',
        save_to_dir = "path_to_folder\\validation",
        shuffle = True)

Finally I train the model:最后我训练模型:

model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.n // train_generator.batch_size,
    epochs = epochs,
    validation_data = validation_generator,
    validation_steps = validation_generator.n // validation_generator.batch_size,
    callbacks = callback_list)        

The problem is the validation set only contains images from class bad .问题是验证集只包含来自bad类的图像。 There are no images of the other class.没有其他班级的图像。 I have used save images to directory parameter and I only see images from one class.我使用了将图像保存到目录参数,并且我只看到来自一类的图像。 The training generator seems fine(has images of both good and bad).训练生成器看起来不错(有好的和坏的图像)。 My validation accuracy is always 0 or 1 because of this error.由于此错误,我的验证准确度始终为 0 或 1。 I have seen examples online and tried to follow them.我在网上看过例子并试图遵循它们。 Nobody seems to face this problem so I am not sure what I am doing incorrectly.似乎没有人面临这个问题,所以我不确定我做错了什么。

I am using these versions: python - 3.7.4我正在使用这些版本:python - 3.7.4

tensorflow - 2.0.0张量流 - 2.0.0

keras - 2.3.1 keras - 2.3.1

I realized that the flow_from_dataframe() takes the first 25% images from the list instead of choosing randomly.我意识到 flow_from_dataframe() 从列表中获取前 25% 的图像而不是随机选择。 Since my list is sorted, meaning all good classes are together and bad together, it was taking the first 25% of the images and sending it to the validation set and since the list is sorted it always put good images in the val_set.由于我的列表已排序,这意味着所有好的类都放在一起,坏的放在一起,它取前 25% 的图像并将其发送到验证集,并且由于列表已排序,因此它总是将好的图像放入 val_set 中。 I used我用了

from sklearn.utils import shuffle dataframes = shuffle(dataframes)

to shuffle and send it to the flow_from_dataframe() and that solved the problem.洗牌并将其发送到 flow_from_dataframe() 并解决了问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM