在混淆矩阵中计算精度和召回率

Question

Suppose I have a confusion matrix as like as below.假设我有一个混淆矩阵，如下所示。 How can I calculate precision and recall?如何计算精度和召回率？

Answer 1

first, your matrix is arranged upside down.首先，你的矩阵是倒置的。 You want to arrange your labels so that true positives are set on the diagonal [(0,0),(1,1),(2,2)] this is the arrangement that you're going to find with confusion matrices generated from sklearn and other packages.您想排列标签，以便在对角线上设置真正的正数 [(0,0),(1,1),(2,2)] sklearn 和其他软件包。

Once we have things sorted in the right direction, we can take a page from this answer and say that:一旦我们将事情按正确的方向排序，我们可以从这个答案中取出一页并说：

True Positives are on the diagonal position真阳性在对角线位置
False positives are column-wise sums.误报是按列求和。 Without the diagonal没有对角线
False negatives are row-wise sums.假阴性是按行求和。 Without the diagonal.没有对角线。

\\ Then we take some formulas from sklearn docs for precision and recall. \\ 然后我们从 sklearn docs 中获取一些公式以获得精确度和召回率。 And put it all into code:并将其全部放入代码中：

import numpy as np
cm = np.array([[2,1,0], [3,4,5], [6,7,8]])
true_pos = np.diag(cm)
false_pos = np.sum(cm, axis=0) - true_pos
false_neg = np.sum(cm, axis=1) - true_pos

precision = np.sum(true_pos / (true_pos + false_pos))
recall = np.sum(true_pos / (true_pos + false_neg))

Since we remove the true positives to define false_positives/negatives only to add them back... we can simplify further by skipping a couple of steps:由于我们删除了真阳性来定义 false_positives/negatives 只是为了将它们添加回来......我们可以通过跳过几个步骤来进一步简化：

 true_pos = np.diag(cm) 
 precision = np.sum(true_pos / np.sum(cm, axis=0))
 recall = np.sum(true_pos / np.sum(cm, axis=1))

Answer 2

I don't think you need summation at last.我认为你最终不需要求和。 Without summation, your method is correct;没有求和，你的方法是正确的； it gives precision and recall for each class.它为每个类提供精确度和召回率。

If you intend to calculate average precision and recall, then you have two options: micro and macro-average.如果您打算计算平均精度和召回率，那么您有两种选择：微观平均和宏观平均。

Read more here http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html在这里阅读更多http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html

Answer 3

For the sake of completeness for future reference, given a list of grounth (gt) and prediction (pd).为了将来参考的完整性，给出了一个列表（gt）和预测（pd）。 The following code snippet computes confusion matrix and then calculates precision and recall.以下代码片段计算混淆矩阵，然后计算精度和召回率。

from sklearn.metrics import confusion_matrix

gt = [1,1,2,2,1,0]
pd = [1,1,1,1,2,0]

cm = confusion_matrix(gt, pd)

#rows = gt, col = pred

#compute tp, tp_and_fn and tp_and_fp w.r.t all classes
tp_and_fn = cm.sum(1)
tp_and_fp = cm.sum(0)
tp = cm.diagonal()

precision = tp / tp_and_fp
recall = tp / tp_and_fn

Answer 4

Given:鉴于：

hypothetical confusion matrix ( cm )假设混淆矩阵 ( cm )

cm = 
[[ 970    1    2    1    1    6   10    0    5    0]
 [   0 1105    7    3    1    6    0    3   16    0]
 [   9   14  924   19   18    3   13   12   24    4]
 [   3   10   35  875    2   34    2   14   19   19]
 [   0    3    6    0  903    0    9    5    4   32]
 [   9    6    4   28   10  751   17    5   24    9]
 [   7    2    6    0    9   13  944    1    7    0]
 [   3   11   17    3   16    3    0  975    2   34]
 [   5   38   10   16    7   28    5    4  830   20]
 [   5    3    5   13   39   10    2   34    5  853]]

Goal:目标：

precision and recall for each class using map() to calculate list division.使用map()计算列表划分的每个类的精度和召回率。

from operator import truediv
import numpy as np

tp = np.diag(cm)
prec = list(map(truediv, tp, np.sum(cm, axis=0)))
rec = list(map(truediv, tp, np.sum(cm, axis=1)))
print ('Precision: {}\nRecall: {}'.format(prec, rec))

Result:结果：

Precision: [0.959, 0.926, 0.909, 0.913, 0.896, 0.880, 0.941, 0.925, 0.886, 0.877]
Recall:    [0.972, 0.968, 0.888, 0.863, 0.937, 0.870, 0.954, 0.916, 0.861, 0.880]

please note: 10 classes, 10 precisions and 10 recalls.请注意：10 个类别，10 个精度和 10 个召回。

Answer 5

Take a look at the answer posted by @Aaditya Ura: https://stackoverflow.com/a/63922083/11534375看看@Aaditya Ura 发布的答案： https://stackoverflow.com/a/63922083/11534375

You can use a custom library called Disarray .您可以使用名为Disarray的自定义库。 It helps to generate all the required metrics from a confusion matrix.它有助于从混淆矩阵中生成所有必需的指标。

Answer 6

Agreeing with gruangly and EuWern, I modified PabTorre's solution accordingly to generate precision and recall per class.同意 gruangly 和 EuWern，我相应地修改了 PabTorre 的解决方案，以根据 class 生成精度和召回率。

Also, given my use case (NER) where a model could:此外，鉴于我的用例 (NER)，其中 model 可以：

Never predict a class that is present in the input text (ie a column of zeros, ie TP:0, FP:0, FN: all), causing a nan in the precision array, or永远不要预测输入文本中存在的 class（即一列零，即 TP:0、FP:0、FN: all），从而导致精度数组中出现nan ，或
Predict a class that is completely absent in the input text (ie a row of zeros, ie TP:0, FN:0, FP: all), causing a nan in the recall array...预测输入文本中完全不存在的 class（即一行零，即 TP:0, FN:0, FP: all），导致召回数组中出现nan ...

I wrap the array with a numpy.nan_to_num() to convert any nan to zero.我用numpy.nan_to_num()包装数组以将任何nan转换为零。 This is not a mathematical decision, but a per use-case, functional decision in how to handle never-predicted, or never-occuring classes.这不是一个数学决策，而是针对如何处理从未预测或从未发生的类的每个用例的功能决策。

import numpy
confusion_matrix = numpy.array([
        [ 5,  0,  0,  0,  0,  3], 
        [ 0,  2,  0,  1,  0,  5],
        [ 0,  0,  0,  3,  5,  7],
        [ 0,  0,  0,  9,  0,  0],
        [ 0,  0,  0,  9, 32,  3],
        [ 0,  0,  0,  0,  0,  0]
        ])
true_positives = numpy.diag(confusion_matrix)
false_positives = numpy.sum(confusion_matrix, axis=0) - true_positives
false_negatives = numpy.sum(confusion_matrix, axis=1) - true_positives

precision = numpy.nan_to_num(numpy.divide(true_positives, (true_positives + false_positives)))
recall = numpy.nan_to_num(numpy.divide(true_positives, (true_positives + false_negatives)))

print(true_positives)       # [ 5  2  0  9 32  0 ]
print(false_positives)      # [ 0  0  0 13  5 18 ]
print(false_negatives)      # [ 3  6 15  0 12  0 ]
print(precision)            # [1. 1. 0. 0.40909091 0.86486486 0. ]
print(recall)               # [0.625 0.25 0. 1. 0.72727273 0. ]

Answer 7

import numpy as np

n_classes=3
cm = np.array([[0,1,2],
               [5,4,3],
               [8,7,6]])

sp = []
f1 = []
gm = []
sens = []
acc= []

for c in range(n_classes):
    tp = cm[c,c]
    fp = sum(cm[:,c]) - cm[c,c]
    fn = sum(cm[c,:]) - cm[c,c]
    tn = sum(np.delete(sum(cm)-cm[c,:],c))

    recall = tp/(tp+fn)
    precision = tp/(tp+fp)
    accuracy = (tp+tn)/(tp+fp+fn+tn)
    specificity = tn/(tn+fp)
    f1_score = 2*((precision*recall)/(precision+recall))
    g_mean = np.sqrt(recall * specificity)
    sp.append(specificity)
    f1.append(f1_score)
    gm.append(g_mean)
    sens.append(recall)
    acc.append(tp)

    print("for class {}: recall {}, specificity {}\
          precision {}, f1 {}, gmean {}".format(c,round(recall,4), round(specificity,4), round(precision,4),round(f1_score,4),round(g_mean,4)))
print("sp: ", np.average(sp))
print("f1: ", np.average(f1))
print("gm: ", np.average(gm))
print("sens: ", np.average(sens))
print("accuracy: ", np.sum(acc)/np.sum(cm))

在混淆矩阵中计算精度和召回率

问题描述

6 个解决方案

解决方案1
9 已采纳 2016-11-21 21:57:33

解决方案2
3 2018-07-03 18:56:46

解决方案3
1 2020-01-17 16:22:21

解决方案4
0 2020-02-05 07:30:32

Given:鉴于：

Goal:目标：

Result:结果：

解决方案5
0 2021-10-02 08:42:09

解决方案6
0 2022-08-09 20:20:32

解决方案7
0 2022-09-29 21:24:03

在混淆矩阵中计算精度和召回率

问题描述

6 个解决方案

解决方案1 9 已采纳 2016-11-21 21:57:33

解决方案2 3 2018-07-03 18:56:46

解决方案3 1 2020-01-17 16:22:21

解决方案4 0 2020-02-05 07:30:32

Given:鉴于：

Goal:目标：

Result:结果：

解决方案5 0 2021-10-02 08:42:09

解决方案6 0 2022-08-09 20:20:32

解决方案7 0 2022-09-29 21:24:03

解决方案1
9 已采纳 2016-11-21 21:57:33

解决方案2
3 2018-07-03 18:56:46

解决方案3
1 2020-01-17 16:22:21

解决方案4
0 2020-02-05 07:30:32

解决方案5
0 2021-10-02 08:42:09

解决方案6
0 2022-08-09 20:20:32

解决方案7
0 2022-09-29 21:24:03