我应该如何使用 mode.predict_generator 来评估混淆矩阵中的模型性能？

Question

I am trying to evaluate a transfer learning model in the common dogs and cats filtered dataset using confusion matrix.我正在尝试使用混淆矩阵评估常见的狗和猫过滤数据集中的迁移学习模型。 I have based the code in the transfer learning tutorial of tensorflow.我的代码基于 tensorflow 的迁移学习教程。 The accuracy graphs for training show an accuracy above 90%.训练准确度图显示准确度超过 90%。

However, using generators to get the true labes and model.predict_generator to get the prediction array throws inconsistent results.然而，使用生成器来获取真实的标签和使用 model.predict_generator 来获取预测数组会抛出不一致的结果。 First, accuracy is not stable, if you run a second time the prediction it changes values.首先，准确性不稳定，如果您第二次运行预测，它会改变值。 Second, the prediction that I get by using model.predict_generator seems to be wrong compared to model.predict on individual instance.其次，与单个实例上的 model.predict 相比，我通过使用 model.predict_generator 得到的预测似乎是错误的。

In order to test quickly the confusion matrix based on the ImageDataGenerator I downloaded 5 images of cats and 5 images of dogs.为了快速测试基于ImageDataGenerator的混淆矩阵，我下载了 5 张猫图像和 5 张狗图像。 Then I created another generator from the folder and checked that labels and classes would be the same as training.然后我从文件夹中创建了另一个生成器并检查标签和类是否与训练相同。

Two Strange Behaviors After that I just used sklearn metrics confusion matrix to evaluate a prediction using model.predict_generator and the labels that I get from the generator as true labels.两种奇怪的行为之后，我只是使用 sklearn 指标混淆矩阵来评估使用 model.predict_generator 和我从生成器中获得的标签作为真实标签的预测。

At first run I got a 0.9 accuracy and say cheers!.在第一次运行时，我得到了 0.9 的准确率并说欢呼！。 however, if I try a second time the model.predict_generator and it throws other values for array output and accuracy dropps to 0.5.但是，如果我第二次尝试 model.predict_generator 并且它会为数组输出和精度下降到 0.5 抛出其他值。 After that it does not change anymore.... What result is correct?之后它不再改变......什么结果是正确的？ Why does it change?为什么会改变？

I have been noticing that you have to run twice to get a final result, but the result obtained is wrong.我一直注意到你必须运行两次才能得到最终结果，但得到的结果是错误的。 I wrote some code to test each image individually and I got no wrong in prediction.我写了一些代码来单独测试每个图像，我的预测没有错。 So what am I doing wrong?那么我做错了什么？ or are generators not appliable to this situation.或者发电机不适用于这种情况。 This is a bit confusing这有点令人困惑

Code can be chacked at my github repository and can be used in google colaboratory to be run if you have no gpu.代码可以在我的 github 存储库中进行修改，如果您没有 gpu，可以在 google colaboratory 中使用以运行。 In fact in my little toshiba satellite runs well with a nvidia gpu of just 2 gb and 300 cuda事实上，在我的小东芝卫星上，只有 2 GB 和 300 cuda 的 nvidia gpu 运行良好

complete code at my git在我的 git 上完成代码

The code is organized as jupyter notebook however here I add the code Transfer Learning is based on https://www.tensorflow.org/tutorials/images/transfer_learning代码组织为 jupyter 笔记本，但在这里我添加代码迁移学习基于https://www.tensorflow.org/tutorials/images/transfer_learning

To create the generator:创建生成器：

test_base_dir = '.'
test_dir = os.path.join( test_base_dir, 'test')
test_datagen_2 = ImageDataGenerator( rescale = 1.0/255. )
test_generator = test_datagen_2.flow_from_directory( test_dir,
                                                     batch_size  = 1,
                                                     class_mode  = binary', 
                                                     target_size = (image_size, image_size))

And for prediction:对于预测：

   filenames = test_generator.filenames
   nb_samples = len(filenames)
   y_predict = model.predict_generator(test_generator,steps = 
   nb_samples)
   y_predict

Which I round using numpy to finally use confusion matrix metric我使用 numpy 最终使用混淆矩阵度量


from sklearn.metrics  import confusion_matrix
cm = confusion_matrix(y_true=test_generator.labels, y_pred=y_predict_rounded)
cm

The manual verification is instead:手动验证改为：

def prediction(path_img):
img = image.load_img(path_img, target_size=(150,150))
x = image.img_to_array(img)
x = x/255.
x = np.expand_dims(x, axis=0)
classes = model.predict(x)
plt.imshow(img)
if classes > 0.5:
    print(path_img.split('/')[-1]+' is a dog')
else:
     print(path_img.split('/')[-1]+' is a cat')   
return classes

Which I use in the following way:我通过以下方式使用：

y_pred_m = []
files=[]
for filename in os.listdir(test_dir):
    file = test_dir+'/'+filename
    for item in os.listdir(file):
        file2 = file+'/'+item
        if file2.split('.')[-1]=='jpg':
            files.append(file2)

And prediction goes:预测如下：

prediction_array = [prediction(img) for img in files]

np.round(prediction_array, decimals=0)

Expected resutls should be to have a confusion matrix with an accuracy level similar to training.预期的结果应该是具有与训练相似的准确度级别的混淆矩阵。 Since the verification of each example individually seems to have no error in prediction, however model.predict_generate seems to go wrong.由于单独验证每个示例似乎没有预测错误，但是 model.predict_generate 似乎出错了。

Answer 1

The problem was that as default _flow_from_directory_ uses shuffle = True .问题是默认_flow_from_directory_使用 shuffle = True 。 Predictions are correct if shuffle goes to False.如果 shuffle 为 False，则预测是正确的。 However, using validation dataset to evaluate training seems to do right even though shuffle is True.然而，使用验证数据集来评估训练似乎是正确的，即使 shuffle 为 True。 I have updated git for these changes to be populated我已经更新了 git 以填充这些更改

# Flow validation images in batches of 20 using test_datagen generator
test_generator =  test_datagen_2.flow_from_directory( test_dir,
                                                  batch_size  = 1,
                                                  class_mode  = 'binary', 
                                                  target_size = (image_size, 
image_size),
                                                  shuffle = False)

我应该如何使用 mode.predict_generator 来评估混淆矩阵中的模型性能？

问题描述

1 个解决方案

解决方案1
0 2019-06-26 13:37:04

我应该如何使用 mode.predict_generator 来评估混淆矩阵中的模型性能？

问题描述

1 个解决方案

解决方案1 0 2019-06-26 13:37:04

解决方案1
0 2019-06-26 13:37:04