简体   繁体   English

根据每个输入元素返回 tp, tn, fn, fp

[英]Return tp, tn, fn, fp based on each input element

I have a csv file with true and predicted labels (4 classes) associated with an ID.我有一个 csv 文件,其中包含与 ID 关联的真实和预测标签(4 个类)。 The csv file looks like this: csv 文件如下所示:

task_id,labels_true,labels_pred
76017-126511-18,2,2
76017-126512-18,0,3
76017-126513-18,2,2
76018-126511-18,2,2
76018-126512-18,2,2
76018-126513-18,2,1
76019-126511-18,2,2
76019-126512-18,1,0

I am using the confusion matrix from sklearn.metrics我正在使用来自sklearn.metrics的混淆矩阵

y_true = df["labels_true"]
y_pred = df["labels_pred"]

cnf_matrix = confusion_matrix(y_true, y_pred, labels=[0,1,2,3])

It returns an array as follows:它返回一个数组,如下所示:

[[ 554    1   28    0]
[  15  1375   43    0]
[  42   476 2263    0]
[   0    0    0    0]]

My aim is to return a list with each element ID associated with the respective tp, tn, fp, fn values like this:我的目标是返回一个列表,每个元素 ID 与相应的 tp、tn、fp、fn 值相关联,如下所示:

task_id,labels_true,labels_pred, cm
76017-126511-18,2,2, tp 
76017-126513-18,2,2, tp
76018-126511-18,2,2, tp

It's a multi class confusion matrix.这是一个多类混淆矩阵。 True/False positives are used for binary classification problems.真/假阳性用于二元分类问题。 What you can do is to ecncode your labels as a binary values for example (classes 1,2,3 encoded as 1) and recalculate the confusion matrix.您可以做的是将您的标签编码为二进制值(例如,将类 1、2、3 编码为 1)并重新计算混淆矩阵。

TL;DR : For multi-class cases, this is not possible. TL; DR:对于多级的情况下,这是不可能的。


As already suggested, the very notions of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) come from binary classification settings;如前所述,真阳性 (TP)、真阴性 (TN)、假阳性 (FP) 和假阴性 (FN) 的概念来自二元分类设置; they can indeed be used in multi-class classification, as shown here , but in such cases the notions are not a straightforward extension of the binary case, making what you ask here actually impossible.他们的确可以多类分类中使用,如图所示这里,但在这种情况下,概念不是二进制的情况的直接扩展,让你在这里问什么实际上是不可能的。

In multi-class classification, all these notions are defined and calculated per class .在多类分类中,所有这些概念都是按类定义和计算的。 And this renders any effort to uniquely identify a sample as being in one and only one of these categories (TP, FP, TN, FN) impossible.这使得将样本唯一标识为属于这些类别(TP、FP、TN、FN)中的一种且仅一种的努力变得不可能。

Let's demonstrate this with some examples, using your case (4 classes [0, 1, 2, 3] ).让我们通过一些示例来演示这一点,使用您的案例(4 个类[0, 1, 2, 3] )。

Take a misclassified sample first, eg:首先取一个错误分类的样本,例如:

True label:      0
Predicted label: 3
  • From the point of view (POV) of class 0 , this is a False Negative (FN): prediction is not 0 , as it should be0类的角度(POV)来看,这是一个假阴性(FN):预测不是0 ,因为它应该是
  • From the POV of class 1 , this is a True Negative: it is not 1 , and it has correctly been classified as not 1从第1类的 POV 来看,这是一个 True Negative:它不是1 ,并且它已被正确分类为 not 1
  • From the POV of class 2 , this is again a True Negative (TN): it is not 2 , and it has correctly been classified as not 2从第2类的 POV 来看,这又是一个真阴性 (TN):它不是2 ,并且它已被正确分类为非2
  • From the POV of class 3 , this is a False Positive (FP): it has been wrongly classified as 3 without being so从第3类的 POV 来看,这是一个误报(FP):它被错误地归类为3而不是这样

Similar is the case for a correct classification, say类似的情况是正确分类的情况,例如

True label:      2
Predicted label: 2
  • From the POV of class 0 , this is a True Negative (TN): it is not 0 , and it has correctly been classified as not 00类的 POV 来看,这是一个真阴性 (TN):它不是0 ,并且它已被正确分类为非0
  • From the POV of class 1 , this is a True Negative (TN): it is not 1 , and it has correctly been classified as not 1从第1类的 POV 来看,这是一个真阴性 (TN):它不是1 ,并且它已被正确地归类为 not 1
  • From the POV of class 2 , this is a True Positive (TP)从第2类的 POV 来看,这是真阳性 (TP)
  • From the POV of class 3 , this is a True Negative (TN): it is not 3 , and it has correctly been classified as not 3从第3类的 POV 来看,这是一个真负 (TN):它不是3 ,并且它已被正确分类为不是3

Given this exposition, it should hopefully be clear that what you ask is actually not possible in the multi-class case.鉴于此说明,希望您能清楚地知道,在多类情况下,您所要求的实际上是不可能的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM