[英]keras: issue using ImageDataGenerator and KFold for fit_generator
flow_from_directory (directory): This takes in directory but does not take split training images. flow_from_directory (directory):它接收目录但不接收分割训练图像。
sklearn.model_selection.KFold: Provides the split indices of images. sklearn.model_selection.KFold:提供图像的分割索引。 Those could be used in fit() but not in fit_generator()
这些可以在 fit() 中使用,但不能在 fit_generator() 中使用
How can anyone use KFold along with ImageDataGenerator?如何将 KFold 与 ImageDataGenerator 一起使用? Is it there?
它在吗?
At the moment one cannot split a dataset held in the folder using a flow_from_directory
generator.目前,无法使用
flow_from_directory
生成器拆分文件夹中保存的数据集。 This option is simply not implemented.这个选项根本没有实现。 To get the test / train split one need to split the main directory into set of train / test /val directories using eg
os
library in Python.要获得 test / train 拆分,需要使用 Python 中的
os
库将主目录拆分为一组 train / test / val 目录。
Assuming that you have a classification problem with 2 classes, I would do something like:假设您有 2 个类的分类问题,我会执行以下操作:
from keras.utils import to_categorical
train_y = to_categorical(train_y, num_classes=2)
test_y = to_categorical(test_y, num_classes=2)
aug = ImageDataGenerator(...) #your ImageDataGenerator
Model = model.fit_generator(aug.flow(train_x,tain_y, batch_size=32),
validation_data=(test_x,test_y))
To anyone, who bumped into this problem: to the date, at which this answer was posted - there's no (at least, relatively) simple out-of-the-box solution in my opinion and deciding by the result of my own searches.对于遇到此问题的任何人:截至发布此答案的日期 - 我认为没有(至少相对)简单的开箱即用解决方案,并由我自己的搜索结果决定。
The only solution, that I came up with, resolving similar problem in my project, was to make partitions in my dataset, with number of partitions equal to number of folds, and saving them as dictionary with number of partition as a key and file paths list as value for partition.我想出的唯一解决方案是在我的项目中解决类似问题,是在我的数据集中进行分区,分区数等于折叠数,并将它们保存为字典,以分区数作为键和文件路径列出作为分区的值。 After that, you still have to sort your files into class folders for train and validation subsets respectively.
之后,您仍然需要将文件分别分类到用于训练和验证子集的类文件夹中。
For example: let K=10.例如:让 K=10。 Algorithm can be described like this:
算法可以这样描述:
I'm afraid that code snippet for this solution (including sorting script and partition dictionary forming script) is too large to provide it there, but I'll gladly share it if necessary.恐怕此解决方案的代码片段(包括排序脚本和分区字典形成脚本)太大而无法在此处提供,但如有必要,我很乐意分享。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.