在Keras中，如何使用predict_generator对未标记的测试数据执行预测？

Question

I'm trying to build an image classification model. 我正在尝试建立图像分类模型。 It's a 4 class image classification. 这是4类图片分类。 Here is my code for building image generators and running the training: 这是我用于构建图像生成器和运行培训的代码：

train_datagen = ImageDataGenerator(rescale=1./255.,
                               rotation_range=30,
                               horizontal_flip=True,
                               validation_split=0.1)


train_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299), 
                                                class_mode='categorical', batch_size=20,
                                                subset='training')

validation_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299), 
                                                class_mode='categorical', batch_size=20,
                                                subset='validation')

model.compile(Adam(learning_rate=0.001), loss='categorical_crossentropy',
                                         metrics=['accuracy'])

model.fit_generator(train_generator, steps_per_epoch=int(440/20), epochs=20, 
                              validation_data=validation_generator, 
                              validation_steps=int(42/20))

I was able to get train and validation work perfectly because the images in train directory are stored in a separate folder for each class. 因为火车目录中的图像存储在每个班级的单独文件夹中，所以我能够完美地训练火车和进行验证。 But, as you can see below, the test directory has 100 images and no folders inside it. 但是，正如您在下面看到的，测试目录中有100张图像，其中没有文件夹。 It also doesn't have any labels and only contains image files. 它也没有任何标签，仅包含图像文件。

How can I do prediction on the image files in test folder using Keras? 如何使用Keras对测试文件夹中的图像文件进行预测？

Answer 1

If you are interested to only perform prediction, you can load the images by a simple hack like this: 如果您只想执行预测，则可以通过如下简单的方法加载图像：

test_datagen = ImageDataGenerator(rescale=1/255.)

test_generator = test_datagen('PATH_TO_DATASET_DIR/Dataset',
                              # only read images from `test` directory
                              classes=['test'],
                              # don't generate labels
                              class_mode=None,
                              # don't shuffle
                              shuffle=False,
                              # use same size as in training
                              target_size=(299, 299))

preds = model.predict_generator(test_generator)

You can access test_generator.filenames to get a list of corresponding filenames so that you can map them to their corresponding prediction. 您可以访问test_generator.filenames以获得相应文件名的列表，以便可以将它们映射到其相应的预测。

Update (as requested in comments section): if you want to map predicted classes to filenames, first you must find the predicted classes. 更新（按注释部分的要求）：如果要将预测的类映射到文件名，则必须首先找到预测的类。 If your model is a classification model, then probably it has a softmax layer as the classifier. 如果您的模型是分类模型，则可能有一个softmax层作为分类器。 So the values in preds would be probabilities. 因此preds的值将是概率。 Use np.argmax method to find the index with highest probability: 使用np.argmax方法查找概率最高的索引：

preds_cls_idx = preds.argmax(axis=-1)

So this gives you the indices of predicted classes. 因此，这为您提供了预测类的索引。 Now we need to map indices to their string labels (ie "car", "bike", etc.) which are provided by training generator in class_indices attribute: 现在我们需要将索引映射到它们的字符串标签（即“ car”，“ bike”等），这些标签由训练生成器在class_indices属性中提供：

import numpy as np

idx_to_cls = {v: k for k, v in train_generator.class_indices.items()}
preds_cls = np.vectorize(idx_to_cls.get)(preds_cls_idx)
filenames_to_cls = list(zip(test_generator.filenames, preds_cls))

Answer 2

your folder structure be like testfolder/folderofallclassfiles 您的文件夹结构就像testfolder/folderofallclassfiles

you can use 您可以使用

test_generator = test_datagen.flow_from_directory(
    directory=pred_dir,
    class_mode=None,
    shuffle=False
)

before prediction i would also use reset to avoid unwanted outputs 在预测之前，我还将使用重置以避免不必要的输出

EDIT: 编辑：

For your purpose you need to know which image is associated with which prediction. 为了您的目的，您需要知道哪个图像与哪个预测相关联。 The problem is that the data-generator start at different positions in the dataset each time we use the generator, thus giving us different outputs everytime. 问题在于，每次使用生成器时，数据生成器都从数据集中的不同位置开始，从而每次都为我们提供不同的输出。 So, in order to restart at the beginning of the dataset in each call to predict_generator() you would need to exactly match the number of iterations and batches to the dataset-size. 因此，为了在每次调用predict_generator()时在数据集的开头重新启动，您需要将迭代次数和批数与数据集大小完全匹配。
There are multiple ways to encounter this 有多种方法可以解决此问题

a) You can see the internal batch-counter using batch_index of generator a）您可以使用生成器的batch_index查看内部批处理计数器
b) create a new data-generator before each call to predict_generator() b）在每次调用predict_generator()之前创建一个新的数据生成器
c) there is a better and simpler way, which is to call reset() on the generator, and if you have set shuffle=False in flow_from_directory then it should start over from the beginning of the dataset and give the exact same output each time, so now the ordering of testgen.filenames and testgen.classes matches c）有一种更好，更简单的方法，即在生成器上调用reset() ，如果您在flow_from_directory设置shuffle=False ，则它应从数据集的开头重新开始，并每次给出完全相同的输出，因此testgen.filenames和testgen.classes的顺序匹配

test_generator.reset()

Prediction 预测

prediction = model.predict_generator(test_generator,verbose=1,steps=numberofimages/batch_size)

To map the filename with prediction 使用预测映射文件名

predict_generator gives output in probabilities so at first we need to convert them to class number like 0,1.. predict_generator生成器以概率predict_generator给出输出，因此首先我们需要将它们转换为类数，例如0,1。

predicted_class = np.argmax(prediction,axis=1)

next step would be to convert those class number into actual class names 下一步是将这些班级编号转换为实际的班级名称

l = dict((v,k) for k,v in training_set.class_indices.items())
prednames = [l[k] for k in predicted_classes]

getting filenames 获取文件名

filenames = test_generator.filenames

Finally creating df 最终创建df

finaldf = pd.DataFrame({'Filename': filenames,'Prediction': prednames})

在Keras中，如何使用predict_generator对未标记的测试数据执行预测？

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-08-15 22:11:34

解决方案2
1 2019-08-15 22:52:02

在Keras中，如何使用predict_generator对未标记的测试数据执行预测？

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-08-15 22:11:34

解决方案2 1 2019-08-15 22:52:02

解决方案1
1 已采纳 2019-08-15 22:11:34

解决方案2
1 2019-08-15 22:52:02