How should I use mode.predict_generator to evaluate model performance in a Confusion Matrix?

Question

I am trying to evaluate a transfer learning model in the common dogs and cats filtered dataset using confusion matrix. I have based the code in the transfer learning tutorial of tensorflow. The accuracy graphs for training show an accuracy above 90%.

However, using generators to get the true labes and model.predict_generator to get the prediction array throws inconsistent results. First, accuracy is not stable, if you run a second time the prediction it changes values. Second, the prediction that I get by using model.predict_generator seems to be wrong compared to model.predict on individual instance.

In order to test quickly the confusion matrix based on the ImageDataGenerator I downloaded 5 images of cats and 5 images of dogs. Then I created another generator from the folder and checked that labels and classes would be the same as training.

Two Strange Behaviors After that I just used sklearn metrics confusion matrix to evaluate a prediction using model.predict_generator and the labels that I get from the generator as true labels.

At first run I got a 0.9 accuracy and say cheers!. however, if I try a second time the model.predict_generator and it throws other values for array output and accuracy dropps to 0.5. After that it does not change anymore.... What result is correct? Why does it change?

I have been noticing that you have to run twice to get a final result, but the result obtained is wrong. I wrote some code to test each image individually and I got no wrong in prediction. So what am I doing wrong? or are generators not appliable to this situation. This is a bit confusing

Code can be chacked at my github repository and can be used in google colaboratory to be run if you have no gpu. In fact in my little toshiba satellite runs well with a nvidia gpu of just 2 gb and 300 cuda

complete code at my git

The code is organized as jupyter notebook however here I add the code Transfer Learning is based on https://www.tensorflow.org/tutorials/images/transfer_learning

To create the generator:

test_base_dir = '.'
test_dir = os.path.join( test_base_dir, 'test')
test_datagen_2 = ImageDataGenerator( rescale = 1.0/255. )
test_generator = test_datagen_2.flow_from_directory( test_dir,
                                                     batch_size  = 1,
                                                     class_mode  = binary', 
                                                     target_size = (image_size, image_size))

And for prediction:

   filenames = test_generator.filenames
   nb_samples = len(filenames)
   y_predict = model.predict_generator(test_generator,steps = 
   nb_samples)
   y_predict

Which I round using numpy to finally use confusion matrix metric


from sklearn.metrics  import confusion_matrix
cm = confusion_matrix(y_true=test_generator.labels, y_pred=y_predict_rounded)
cm

The manual verification is instead:

def prediction(path_img):
img = image.load_img(path_img, target_size=(150,150))
x = image.img_to_array(img)
x = x/255.
x = np.expand_dims(x, axis=0)
classes = model.predict(x)
plt.imshow(img)
if classes > 0.5:
    print(path_img.split('/')[-1]+' is a dog')
else:
     print(path_img.split('/')[-1]+' is a cat')   
return classes

Which I use in the following way:

y_pred_m = []
files=[]
for filename in os.listdir(test_dir):
    file = test_dir+'/'+filename
    for item in os.listdir(file):
        file2 = file+'/'+item
        if file2.split('.')[-1]=='jpg':
            files.append(file2)

And prediction goes:

prediction_array = [prediction(img) for img in files]

np.round(prediction_array, decimals=0)

Expected resutls should be to have a confusion matrix with an accuracy level similar to training. Since the verification of each example individually seems to have no error in prediction, however model.predict_generate seems to go wrong.

Answer 1

The problem was that as default _flow_from_directory_ uses shuffle = True . Predictions are correct if shuffle goes to False. However, using validation dataset to evaluate training seems to do right even though shuffle is True. I have updated git for these changes to be populated

# Flow validation images in batches of 20 using test_datagen generator
test_generator =  test_datagen_2.flow_from_directory( test_dir,
                                                  batch_size  = 1,
                                                  class_mode  = 'binary', 
                                                  target_size = (image_size, 
image_size),
                                                  shuffle = False)

How should I use mode.predict_generator to evaluate model performance in a Confusion Matrix?

Question

1 answers

solution1
0 2019-06-26 13:37:04

How should I use mode.predict_generator to evaluate model performance in a Confusion Matrix?

Question

1 answers

solution1 0 2019-06-26 13:37:04

solution1
0 2019-06-26 13:37:04