简体   繁体   English

在keras中生成混淆矩阵以进行多类分类

[英]generating confusion matrix in keras for multiclass classification

Getting accuracy up to 98% by training model but confusion matrix shows very high miss-classification.通过训练模型获得高达 98% 的准确率,但混淆矩阵显示非常高的错误分类。

I am working on multiclass classification using keras with transfer learning approach on pre-trained VGG16 model.我正在使用 keras 和预训练 VGG16 模型上的迁移学习方法进行多类分类。

The problem is to classify the images into 5 types of tomato diseases using CNN.问题是使用 CNN 将图像分类为 5 种番茄病害。

There are 5 disease classes with 6970 training images and 70 testing images.有 5 个疾病类别,6970 个训练图像和 70 个测试图像。

Training model shows 98.65% accuracy while testing shows 94% accuracy.训练模型显示准确率为 98.65%,而测试显示准确率为 94%。

But the problem is when I am generating confusion matrix it shows very high miss-classification.但问题是当我生成混淆矩阵时,它显示出非常高的错误分类。

someone please help me, whether my code is wrong or the model is wrong?有人请帮助我,我的代码是错误的还是模型错误? I am confused whether my model is giving me correct results or not.我很困惑我的模型是否给出了正确的结果。

And if someone can explain me how keras actually calculate the accuracy using model.fit_generator Function because applying the general formula of accuracy on confusion matrix is not giving me same results as keras have calculated.如果有人可以向我解释 keras 实际上是如何使用 model.fit_generator 函数计算精度的,因为在混淆矩阵上应用精度的一般公式并没有给我与 keras 计算出的结果相同的结果。

For testing the dataset code is:用于测试数据集的代码是:

test_generator = test_datagen.flow_from_directory(
test_dir,
target_size=(150, 150),
batch_size=20,
class_mode='categorical')
test_loss, test_acc = model.evaluate_generator(test_generator, steps=50)
print('test acc:', test_acc)

I found the code to generate confusion matrix from one of the forum;我从其中一个论坛找到了生成混淆矩阵的代码;

code is:代码是:

import numpy as np
from sklearn.metrics import confusion_matrix,classification_report
batch_size = 20
num_of_test_samples = 70
predictions = model.predict_generator(test_generator,  num_of_test_samples // batch_size+1)

y_pred = np.argmax(predictions, axis=1)

true_classes = test_generator.classes

class_labels = list(test_generator.class_indices.keys())   

print(class_labels)

print(confusion_matrix(test_generator.classes, y_pred))

report = classification_report(true_classes, y_pred, target_names=class_labels)
print(report)

Following are the results I get:以下是我得到的结果:

Testing accuracy:测试精度:

Found 70 images belonging to 5 classes.
test acc: 0.9420454461466182

Results of Confusion matrix:混淆矩阵的结果:

['TEB', 'TH', 'TLB', 'TLM', 'TSL']
[[2 3 2 4 3]
 [4 2 3 0 5]
 [3 3 3 2 3]
 [3 3 2 4 2]
 [2 2 4 4 2]]]
              precision    recall  f1-score   support

         TEB       0.14      0.14      0.14        14
          TH       0.15      0.14      0.15        14
         TLB       0.21      0.21      0.21        14
         TLM       0.29      0.29      0.29        14
         TSL       0.13      0.14      0.14        14

   micro avg       0.19      0.19      0.19        70
   macro avg       0.19      0.19      0.19        70
weighted avg       0.19      0.19      0.19        70

While creating the test data generator, the flow_from_directory method takes shuffle=True parameter by default.在创建测试数据生成器时, flow_from_directory方法默认采用shuffle=True参数。 Therefore, when you predict by plugging in the generator instance, the predictions are not shown in the same order as the true classes are.因此,当您通过插入生成器实例进行预测时,预测的显示顺序与真实类的显示顺序不同。 This is the reason you are getting the right predictions, but in a different order.这就是您获得正确预测但顺序不同的原因。 So, the confusion matrix is showing bad performance.因此,混淆矩阵表现不佳。

Just set shuffle to False in the test data generator and the predictions will come in the right order.只需在测试数据生成器中将 shuffle 设置为 False,预测就会以正确的顺序出现。 As the purpose of validation/test data is to evaluate the model, you can almost always set shuffle to False.由于验证/测试数据的目的是评估模型,您几乎总是可以将 shuffle 设置为 False。

The test labels should be class_indices rather than classes测试标签应该是 class_indices 而不是 classes

true_classes = test_generator.class_indices

Dear always do the following for any classification performance parameters:亲爱的总是对任何分类性能参数执行以下操作:

  1. first reset the generator which you are using in prediction首先重置您在预测中使用的生成器
  2. put shuffle equal to false in flow_from_directory()在 flow_from_directory() 中将 shuffle 设置为 false

I may be late to the party but maybe you aren't preprocessing test data the same way as train ones.我可能会迟到,但也许您没有像训练数据那样预处理测试数据。 Try to import preprocessing function from VGG16 and add it to generator as parameter (preprocessing_function).尝试从 VGG16 导入预处理函数并将其作为参数(preprocessing_function)添加到生成器中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM