[英]How to perform prediction using predict_generator on unlabeled test data in Keras?
I'm trying to build an image classification model. 我正在尝试建立图像分类模型。 It's a 4 class image classification.
这是4类图片分类。 Here is my code for building image generators and running the training:
这是我用于构建图像生成器和运行培训的代码:
train_datagen = ImageDataGenerator(rescale=1./255.,
rotation_range=30,
horizontal_flip=True,
validation_split=0.1)
train_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299),
class_mode='categorical', batch_size=20,
subset='training')
validation_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299),
class_mode='categorical', batch_size=20,
subset='validation')
model.compile(Adam(learning_rate=0.001), loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit_generator(train_generator, steps_per_epoch=int(440/20), epochs=20,
validation_data=validation_generator,
validation_steps=int(42/20))
I was able to get train and validation work perfectly because the images in train directory are stored in a separate folder for each class. 因为火车目录中的图像存储在每个班级的单独文件夹中,所以我能够完美地训练火车和进行验证。 But, as you can see below, the test directory has 100 images and no folders inside it.
但是,正如您在下面看到的, 测试目录中有100张图像,其中没有文件夹。 It also doesn't have any labels and only contains image files.
它也没有任何标签,仅包含图像文件。
How can I do prediction on the image files in test folder using Keras? 如何使用Keras对测试文件夹中的图像文件进行预测?
If you are interested to only perform prediction, you can load the images by a simple hack like this: 如果您只想执行预测,则可以通过如下简单的方法加载图像:
test_datagen = ImageDataGenerator(rescale=1/255.)
test_generator = test_datagen('PATH_TO_DATASET_DIR/Dataset',
# only read images from `test` directory
classes=['test'],
# don't generate labels
class_mode=None,
# don't shuffle
shuffle=False,
# use same size as in training
target_size=(299, 299))
preds = model.predict_generator(test_generator)
You can access test_generator.filenames
to get a list of corresponding filenames so that you can map them to their corresponding prediction. 您可以访问
test_generator.filenames
以获得相应文件名的列表,以便可以将它们映射到其相应的预测。
Update (as requested in comments section): if you want to map predicted classes to filenames, first you must find the predicted classes. 更新(按注释部分的要求):如果要将预测的类映射到文件名,则必须首先找到预测的类。 If your model is a classification model, then probably it has a softmax layer as the classifier.
如果您的模型是分类模型,则可能有一个softmax层作为分类器。 So the values in
preds
would be probabilities. 因此
preds
的值将是概率。 Use np.argmax
method to find the index with highest probability: 使用
np.argmax
方法查找概率最高的索引:
preds_cls_idx = preds.argmax(axis=-1)
So this gives you the indices of predicted classes. 因此,这为您提供了预测类的索引。 Now we need to map indices to their string labels (ie "car", "bike", etc.) which are provided by training generator in
class_indices
attribute: 现在我们需要将索引映射到它们的字符串标签(即“ car”,“ bike”等),这些标签由训练生成器在
class_indices
属性中提供:
import numpy as np
idx_to_cls = {v: k for k, v in train_generator.class_indices.items()}
preds_cls = np.vectorize(idx_to_cls.get)(preds_cls_idx)
filenames_to_cls = list(zip(test_generator.filenames, preds_cls))
your folder structure be like testfolder/folderofallclassfiles
您的文件夹结构就像
testfolder/folderofallclassfiles
you can use 您可以使用
test_generator = test_datagen.flow_from_directory(
directory=pred_dir,
class_mode=None,
shuffle=False
)
before prediction i would also use reset to avoid unwanted outputs 在预测之前,我还将使用重置以避免不必要的输出
EDIT: 编辑:
For your purpose you need to know which image is associated with which prediction. 为了您的目的,您需要知道哪个图像与哪个预测相关联。 The problem is that the data-generator start at different positions in the dataset each time we use the generator, thus giving us different outputs everytime.
问题在于,每次使用生成器时,数据生成器都从数据集中的不同位置开始,从而每次都为我们提供不同的输出。 So, in order to restart at the beginning of the dataset in each call to
predict_generator()
you would need to exactly match the number of iterations and batches to the dataset-size. 因此,为了在每次调用
predict_generator()
时在数据集的开头重新启动,您需要将迭代次数和批数与数据集大小完全匹配。
There are multiple ways to encounter this 有多种方法可以解决此问题
a) You can see the internal batch-counter using batch_index
of generator a)您可以使用生成器的
batch_index
查看内部批处理计数器
b) create a new data-generator before each call to predict_generator()
b)在每次调用
predict_generator()
之前创建一个新的数据生成器
c) there is a better and simpler way, which is to call reset()
on the generator, and if you have set shuffle=False
in flow_from_directory
then it should start over from the beginning of the dataset and give the exact same output each time, so now the ordering of testgen.filenames
and testgen.classes
matches c)有一种更好,更简单的方法,即在生成器上调用
reset()
,如果您在flow_from_directory
设置shuffle=False
,则它应从数据集的开头重新开始,并每次给出完全相同的输出,因此testgen.filenames
和testgen.classes
的顺序匹配
test_generator.reset()
Prediction 预测
prediction = model.predict_generator(test_generator,verbose=1,steps=numberofimages/batch_size)
To map the filename with prediction 使用预测映射文件名
predict_generator
gives output in probabilities so at first we need to convert them to class number like 0,1.. predict_generator
生成器以概率predict_generator
给出输出,因此首先我们需要将它们转换为类数,例如0,1。
predicted_class = np.argmax(prediction,axis=1)
next step would be to convert those class number into actual class names 下一步是将这些班级编号转换为实际的班级名称
l = dict((v,k) for k,v in training_set.class_indices.items())
prednames = [l[k] for k in predicted_classes]
getting filenames 获取文件名
filenames = test_generator.filenames
Finally creating df 最终创建df
finaldf = pd.DataFrame({'Filename': filenames,'Prediction': prednames})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.