简体   繁体   中英

How can I interpret Scikit-learn confusion matrix

I am using confusion matrix to check the performance of my classifier.

I am using Scikit-Learn, I am little bit confused. How can I interpret the result from

from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

How can I take the decision whether this predicted values are good or no.

The simplest way to reach the decision whether the classifier is good or bad is just to calculate an error using some of standard error measures (for example the Mean squared error ). I imagine your example is copied from Scikit's documentation , so I assume you've read the definition.

We have three classes here: 0 , 1 and 2 . On the diagonal, the confusion matrix tells you, how often a particular class have been predicted correctly. So from the diagonal 2 0 2 we can say that class with index 0 was classified correctly 2 times, class with index 1 was never predicted correctly, and class with index 2 was predicted correctly 2 times.

Under and above the diagonal you have numbers which tell you how many times a class with index equal to the element's row number was classified as class with index equal to matrix's column. For example if you look at the first column, under the diagonal you have: 0 1 (in the lower left corner of the matrix). The lower 1 tells you that class with index 2 (the last row) was once erroneously classified as 0 (the first column). This corresponds to the fact that in your y_true there was one sample with label 2 and was classified as 0 . This happened for the first sample.

If you sum all the numbers from the confusion matrix you get the number of testing samples ( 2 + 2 + 1 + 1 = 6 - equal to the length of y_true and y_pred ). If you sum the rows you get the number of samples for each label: as you can verify, indeed there are two 0 s, one 1 and three 2 s in y_pred .

If you for example divide matrix elements by this number, you could tell that, for example, class with label 2 is recognized with correctly with ~66% accuracy, and in 1/3rd of cases it's confused (hence the name) with class with label 0 .

TL;DR:

While single-number error measures measure overall performance, with confusion matrix you can determine if (some examples):

  • your classifier just sucks with everything

  • or it handles some classes well, and some not (this gives you a hint to look at this particular part of your data and observe classifier's behaviour for these cases)

  • it does well, but confuses label A with B quite often. For example, for linear classifiers you may want to check then, if these classes are linearly separable.

Etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM