Return tp, tn, fn, fp based on each input element

Question

I have a csv file with true and predicted labels (4 classes) associated with an ID. The csv file looks like this:

task_id,labels_true,labels_pred
76017-126511-18,2,2
76017-126512-18,0,3
76017-126513-18,2,2
76018-126511-18,2,2
76018-126512-18,2,2
76018-126513-18,2,1
76019-126511-18,2,2
76019-126512-18,1,0

I am using the confusion matrix from sklearn.metrics

y_true = df["labels_true"]
y_pred = df["labels_pred"]

cnf_matrix = confusion_matrix(y_true, y_pred, labels=[0,1,2,3])

It returns an array as follows:

[[ 554    1   28    0]
[  15  1375   43    0]
[  42   476 2263    0]
[   0    0    0    0]]

My aim is to return a list with each element ID associated with the respective tp, tn, fp, fn values like this:

task_id,labels_true,labels_pred, cm
76017-126511-18,2,2, tp 
76017-126513-18,2,2, tp
76018-126511-18,2,2, tp

Answer 1

It's a multi class confusion matrix. True/False positives are used for binary classification problems. What you can do is to ecncode your labels as a binary values for example (classes 1,2,3 encoded as 1) and recalculate the confusion matrix.

Answer 2

TL;DR : For multi-class cases, this is not possible.

As already suggested, the very notions of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) come from binary classification settings; they can indeed be used in multi-class classification, as shown here , but in such cases the notions are not a straightforward extension of the binary case, making what you ask here actually impossible.

In multi-class classification, all these notions are defined and calculated per class . And this renders any effort to uniquely identify a sample as being in one and only one of these categories (TP, FP, TN, FN) impossible.

Let's demonstrate this with some examples, using your case (4 classes [0, 1, 2, 3] ).

Take a misclassified sample first, eg:

True label:      0
Predicted label: 3

From the point of view (POV) of class 0 , this is a False Negative (FN): prediction is not 0 , as it should be
From the POV of class 1 , this is a True Negative: it is not 1 , and it has correctly been classified as not 1
From the POV of class 2 , this is again a True Negative (TN): it is not 2 , and it has correctly been classified as not 2
From the POV of class 3 , this is a False Positive (FP): it has been wrongly classified as 3 without being so

Similar is the case for a correct classification, say

True label:      2
Predicted label: 2

From the POV of class 0 , this is a True Negative (TN): it is not 0 , and it has correctly been classified as not 0
From the POV of class 1 , this is a True Negative (TN): it is not 1 , and it has correctly been classified as not 1
From the POV of class 2 , this is a True Positive (TP)
From the POV of class 3 , this is a True Negative (TN): it is not 3 , and it has correctly been classified as not 3

Given this exposition, it should hopefully be clear that what you ask is actually not possible in the multi-class case.

Return tp, tn, fn, fp based on each input element

Question

2 answers

solution1
0 2020-11-18 12:52:09

solution2
0 ACCPTED 2020-11-19 01:48:18

Return tp, tn, fn, fp based on each input element

Question

2 answers

solution1 0 2020-11-18 12:52:09

solution2 0 ACCPTED 2020-11-19 01:48:18

solution1
0 2020-11-18 12:52:09

solution2
0 ACCPTED 2020-11-19 01:48:18