良好的测试准确性，但混淆矩阵结果较差

Question

Ive trained a model to classify 4 types of eye diseases using MobileNet as the pretrained model. I achieved a test accuracy of 94%, but when I look at the confusion matrix, it seems like it isn't doing so well.我训练了一个 model 作为预训练的 model 使用 MobileNet 对 4 种眼病进行分类。我的测试准确率达到了 94%，但是当我查看混淆矩阵时，它似乎做得不太好。 Loss is relatively low on training, validation, and testing.训练、验证和测试的损失相对较低。 Any suggestions on where I went wrong or if im missing something conceptually?关于我哪里出错或者我在概念上遗漏了什么的任何建议？

Image_height = 224
Image_width = 224
val_split = 0.20
batches_size = 16
lr = 0.0005
spe = 220
vs = 32
epoch = 6

# Getting the file of the training set and testing set
train_folder = "/content/drive/My Drive/Research/train"
test_folder = "/content/drive/My Drive/Research/test"



#Creating batches
train_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.mobilenet.preprocess_input,validation_split=val_split) \
    .flow_from_directory(directory=train_folder, target_size=(Image_height,Image_width), classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical",
                              subset="training")
validation_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.mobilenet.preprocess_input,validation_split=val_split) \
    .flow_from_directory(directory=train_folder, target_size=(Image_height,Image_width), classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical",
                              subset="validation")
test_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.mobilenet.preprocess_input) \
                       .flow_from_directory(test_folder, target_size=(Image_height,Image_width), 
                         classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical")



mobile = tf.keras.applications.mobilenet.MobileNet(include_top=False,
                                                           input_shape=(224, 224,3),
                                                           pooling='max', weights='imagenet',
                                                           alpha=1, depth_multiplier=1,dropout=.5)
x=mobile.layers[-1].output
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
predictions=Dense (4, activation='softmax')(x)
model = Model(inputs=mobile.input, outputs=predictions)    
for layer in model.layers:
    layer.trainable=True
model.compile(Adamax(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])
checkpoint=tf.keras.callbacks.ModelCheckpoint(filepath="/content/drive/My Drive/Research/ModelCheckpoint", monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', save_freq='epoch', options=None)
lr_adjust=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1, verbose=0, mode="auto",
    min_delta=0.00001,  cooldown=0,  min_lr=0) 
callbacks=[checkpoint, lr_adjust]


model.fit(train_batches, steps_per_epoch=spe,
                    validation_data=validation_batches,validation_steps=vs, epochs=epoch)

# Predict the accuracy on the Test set
acc = model.evaluate_generator(test_batches, steps=len(test_batches), verbose=1)
print("Model Accuracy on Test Data", acc[1]*100)


y = []
for x in range(0,len(test_batches)):
  for i in range(0,len(test_batches[x][1])):
    #print(test_batches[0][1][i])
    y.append(np.argmax(test_batches[x][1][i]))
print(len(y))

con_mat = tf.math.confusion_matrix(labels=y, predictions=np.argmax(predictions,axis=1)).numpy()
print(con_mat)

Training/Validation培训/验证

Epoch 1/6
220/220 [==============================] - 2952s 13s/step - loss: 0.5842 - accuracy: 0.7912 - val_loss: 0.7926 - val_accuracy: 0.7988
Epoch 2/6
220/220 [==============================] - 2736s 12s/step - loss: 0.4041 - accuracy: 0.8723 - val_loss: 0.3094 - val_accuracy: 0.9023
Epoch 3/6
220/220 [==============================] - 2635s 12s/step - loss: 0.3718 - accuracy: 0.8804 - val_loss: 0.3871 - val_accuracy: 0.8906
Epoch 4/6
220/220 [==============================] - 2517s 11s/step - loss: 0.2904 - accuracy: 0.8980 - val_loss: 0.2863 - val_accuracy: 0.9160
Epoch 5/6
220/220 [==============================] - 2364s 11s/step - loss: 0.2779 - accuracy: 0.9057 - val_loss: 0.3500 - val_accuracy: 0.9238
Epoch 6/6
220/220 [==============================] - 2241s 10s/step - loss: 0.2839 - accuracy: 0.9068 - val_loss: 0.2202 - val_accuracy: 0.9355
<tensorflow.python.keras.callbacks.History at 0x7f6f8a59eb70>

Testing测试

WARNING:tensorflow:From <ipython-input-12-d213edec98d3>:2: Model.evaluate_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
Please use Model.evaluate, which supports generators.
63/63 [==============================] - 837s 13s/step - loss: 0.1519 - accuracy: 0.9410
Model Accuracy on Test Data 94.0999984741211

Confusion Matrix混淆矩阵

[[70 62 57 61]
 [82 61 41 66]
 [74 69 49 58]
 [77 60 48 65]]

Answer 1

I know this is super old, but I just ran into a similar problem was frustrated to not find a answer here.我知道这已经很老了，但我刚刚遇到了类似的问题，因为在这里找不到答案而感到沮丧。 So here it goes:所以这里是：

Setting shuffle = False for the test_batches ImageDataGenerator().flow_from_directory() should solve the problem.为test_batches ImageDataGenerator().flow_from_directory()设置shuffle = False应该可以解决问题。

It seems that the data generator yields different batches when called twice.似乎数据生成器在调用两次时产生不同的批次。 First by your loop extracting the labels and than by model.predict(test_batches) basically making the label and predictions not match because the are for different batches.首先通过循环提取标签，然后通过model.predict(test_batches)基本上使 label 和预测不匹配，因为它们适用于不同的批次。

良好的测试准确性，但混淆矩阵结果较差

问题描述

1 个解决方案

解决方案1
0 2022-03-31 13:39:13

良好的测试准确性，但混淆矩阵结果较差

问题描述

1 个解决方案

解决方案1 0 2022-03-31 13:39:13

解决方案1
0 2022-03-31 13:39:13