Scikit-learn：如何获取True Positive、True Negative、False Positive和False Negative

Question

我的问题：

我有一个数据集，它是一个很大的 JSON 文件。 我读取它并将其存储在trainList变量中。

接下来，我对其进行预处理——以便能够使用它。

完成后，我开始分类：

我使用kfold交叉验证方法来获得平均准确度并训练分类器。
我做出预测并获得该折叠的准确性和混淆矩阵。
在此之后，我想获得True Positive(TP) 、 True Negative(TN) 、 False Positive(FP)和False Negative(FN)值。 我将使用这些参数来获得Sensitivity和Specificity 。

最后，我会用它来输入 HTML，以显示包含每个 label 的 TP 的图表。

代码：

我目前拥有的变量：

trainList #It is a list with all the data of my dataset in JSON form
labelList #It is a list with all the labels of my data

大部分方法：

#I transform the data from JSON form to a numerical one
X=vec.fit_transform(trainList)

#I scale the matrix (don't know why but without it, it makes an error)
X=preprocessing.scale(X.toarray())

#I generate a KFold in order to make cross validation
kf = KFold(len(X), n_folds=10, indices=True, shuffle=True, random_state=1)

#I start the cross validation
for train_indices, test_indices in kf:
    X_train=[X[ii] for ii in train_indices]
    X_test=[X[ii] for ii in test_indices]
    y_train=[listaLabels[ii] for ii in train_indices]
    y_test=[listaLabels[ii] for ii in test_indices]

    #I train the classifier
    trained=qda.fit(X_train,y_train)

    #I make the predictions
    predicted=qda.predict(X_test)

    #I obtain the accuracy of this fold
    ac=accuracy_score(predicted,y_test)

    #I obtain the confusion matrix
    cm=confusion_matrix(y_test, predicted)

    #I should calculate the TP,TN, FP and FN 
    #I don't know how to continue

Answer 1

对于多类情况，您需要的一切都可以从混淆矩阵中找到。 例如，如果您的混淆矩阵如下所示：

然后，您可以在每个班级中找到您要查找的内容，如下所示：

使用 pandas/numpy，您可以一次为所有类执行此操作，如下所示：

FP = confusion_matrix.sum(axis=0) - np.diag(confusion_matrix)  
FN = confusion_matrix.sum(axis=1) - np.diag(confusion_matrix)
TP = np.diag(confusion_matrix)
TN = confusion_matrix.values.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)

# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)

Answer 2

如果您有两个具有预测值和实际值的列表； 正如您所做的那样，您可以将它们传递给一个函数，该函数将使用以下内容计算 TP、FP、TN、FN：

def perf_measure(y_actual, y_hat):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for i in range(len(y_hat)): 
        if y_actual[i]==y_hat[i]==1:
           TP += 1
        if y_hat[i]==1 and y_actual[i]!=y_hat[i]:
           FP += 1
        if y_actual[i]==y_hat[i]==0:
           TN += 1
        if y_hat[i]==0 and y_actual[i]!=y_hat[i]:
           FN += 1

    return(TP, FP, TN, FN)

从这里我认为您将能够计算出您的利率，以及其他性能指标，如特异性和敏感性。

Answer 3

根据 scikit-learn 文档，

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

根据定义，混淆矩阵 C 使得C[i, j]等于已知在组i但预测在组j的观察数。

因此，在二元分类中，真负数为C[0,0] ，假负数为C[1,0] ，真正数为C[1,1] ，假正数为C[0,1] 。

CM = confusion_matrix(y_true, y_pred)

TN = CM[0][0]
FN = CM[1][0]
TP = CM[1][1]
FP = CM[0][1]

Answer 4

您可以从混淆矩阵中获取所有参数。 混淆矩阵（2X2矩阵）的结构如下（假设第一个索引与正标签相关，行与真实标签相关）：

TP|FN
FP|TN

所以

TP = cm[0][0]
FN = cm[0][1]
FP = cm[1][0]
TN = cm[1][1]

更多详情请访问https://en.wikipedia.org/wiki/Confusion_matrix

Answer 5

从混淆矩阵中获得真阳性等的一个班轮是解开它：

from sklearn.metrics import confusion_matrix

y_true = [1, 1, 0, 0]
y_pred = [1, 0, 1, 0]   

tn, fp, fn, tp = confusion_matrix(y_true, y_pred, labels=[0, 1]).ravel()
print(tn, fp, fn, tp)  # 1 1 1 1

如果数据仅包含单个案例，例如仅真阳性，则应设置labels参数。 正确设置labels可确保混淆矩阵具有 2x2 形状。

Answer 6

在 scikit-learn 'metrics' 库中，有一个 Confusion_matrix 方法可以为您提供所需的输出。

您可以使用任何您想要的分类器。 这里我以 KNeighbors 为例。

from sklearn import metrics, neighbors

clf = neighbors.KNeighborsClassifier()

X_test = ...
y_test = ...

expected = y_test
predicted = clf.predict(X_test)

conf_matrix = metrics.confusion_matrix(expected, predicted)

>>> print conf_matrix
>>>  [[1403   87]
     [  56 3159]]

文档： http : //scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

Answer 7

我写了一个只使用 numpy 的版本。 我希望它能帮助你。

import numpy as np

def perf_metrics_2X2(yobs, yhat):
    """
    Returns the specificity, sensitivity, positive predictive value, and 
    negative predictive value 
    of a 2X2 table.

    where:
    0 = negative case
    1 = positive case

    Parameters
    ----------
    yobs :  array of positive and negative ``observed`` cases
    yhat : array of positive and negative ``predicted`` cases

    Returns
    -------
    sensitivity  = TP / (TP+FN)
    specificity  = TN / (TN+FP)
    pos_pred_val = TP/ (TP+FP)
    neg_pred_val = TN/ (TN+FN)

    Author: Julio Cardenas-Rodriguez
    """
    TP = np.sum(  yobs[yobs==1] == yhat[yobs==1] )
    TN = np.sum(  yobs[yobs==0] == yhat[yobs==0] )
    FP = np.sum(  yobs[yobs==1] == yhat[yobs==0] )
    FN = np.sum(  yobs[yobs==0] == yhat[yobs==1] )

    sensitivity  = TP / (TP+FN)
    specificity  = TN / (TN+FP)
    pos_pred_val = TP/ (TP+FP)
    neg_pred_val = TN/ (TN+FN)

    return sensitivity, specificity, pos_pred_val, neg_pred_val

Answer 8

以防万一有人在MULTI-CLASS Example 中寻找相同的东西

def perf_measure(y_actual, y_pred):
    class_id = set(y_actual).union(set(y_pred))
    TP = []
    FP = []
    TN = []
    FN = []

    for index ,_id in enumerate(class_id):
        TP.append(0)
        FP.append(0)
        TN.append(0)
        FN.append(0)
        for i in range(len(y_pred)):
            if y_actual[i] == y_pred[i] == _id:
                TP[index] += 1
            if y_pred[i] == _id and y_actual[i] != y_pred[i]:
                FP[index] += 1
            if y_actual[i] == y_pred[i] != _id:
                TN[index] += 1
            if y_pred[i] != _id and y_actual[i] != y_pred[i]:
                FN[index] += 1


    return class_id,TP, FP, TN, FN

Answer 9

您可以尝试sklearn.metrics.classification_report如下：

import sklearn
y_true = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]
y_pred = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]

print sklearn.metrics.classification_report(y_true, y_pred)

输出：

         precision    recall  f1-score   support

      0       0.80      0.57      0.67         7
      1       0.50      0.75      0.60         4

      avg / total       0.69      0.64      0.64        11

Answer 10

在 scikit 0.22 版本中，你可以这样做

from sklearn.metrics import multilabel_confusion_matrix

y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]

mcm = multilabel_confusion_matrix(y_true, y_pred,labels=["ant", "bird", "cat"])

tn = mcm[:, 0, 0]
tp = mcm[:, 1, 1]
fn = mcm[:, 1, 0]
fp = mcm[:, 0, 1]

Answer 11

如果您的分类器中有多个类，您可能希望在该部分使用 pandas-ml。 pandas-ml 的混淆矩阵提供了更详细的信息。 检查一下

Answer 12

我认为这两个答案都不完全正确。 例如，假设我们有以下数组；
y_actual = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]

y_predic = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]

如果我们手动计算 FP、FN、TP 和 TN 值，它们应该如下：

FP：3 FN：1 TP：3 TN：4

但是，如果我们使用第一个答案，结果如下：

FP：1 FN：3 TP：3 TN：4

它们不正确，因为在第一个答案中，False Positive 应该是实际为 0，但预测为 1，而不是相反。 对于假阴性也是一样的。

而且，如果我们使用第二个答案，结果计算如下：

FP：3 FN：1 TP：4 TN：3

真正数和真负数是不正确的，它们应该是相反的。

我的计算正确吗？ 如果我遗漏了什么，请告诉我。

Answer 13

#False positive cases
train = pd.merge(X_train, y_train,left_index=True, right_index=True)
y_train_pred = pd.DataFrame(y_train_pred)
y_train_pred.rename(columns={0 :'Predicted'}, inplace=True )
train = train.reset_index(drop=True).merge(y_train_pred.reset_index(drop=True),
left_index=True,right_index=True)
train['FP'] = np.where((train['Banknote']=="Forged") & (train['Predicted']=="Genuine"),1,0)
train[train.FP != 0]

Answer 14

def getTPFPTNFN(y_true, y_pred):
    TP, FP, TN, FN = 0, 0, 0, 0
    for s_true, s_pred in zip (y_true, y_pred):
        if s_true == 1:
            if s_pred == 1: 
                TP += 1
            else:
                FN += 1
        else:
            if s_pred == 0:
                TN += 1
            else:
                FP += 1
    return TP, FP, TN, FN

Answer 15

我尝试了一些答案，发现它们不起作用。

这对我有用：

from sklearn.metrics import classification_report

print(classification_report(y_test, predicted))

Answer 16

#假阴性

test = pd.merge(Variables_test, Banknote_test,left_index=True, right_index=True)
Banknote_test_pred = pd.DataFrame(banknote_test_pred)
Banknote_test_pred.rename(columns={0 :'Predicted'}, inplace=True )
test = test.reset_index(drop=True).merge(Banknote_test_pred.reset_index(drop=True), left_index=True, right_index=True)
test['FN'] = np.where((test['Banknote']=="Genuine") & (test['Predicted']=="Forged"),1,0)
test[test.FN != 0]

Answer 17

到目前为止给出的答案都没有对我有用，因为我有时最终会得到一个只有一个条目的混淆矩阵。 以下代码能够缓解此问题：

from sklearn.metrics import confusion_matrix
CM = confusion_matrix(y, y_hat)
            
try:
    TN = CM[0][0]
except IndexError:
    TN = 0
try:
    FN = CM[1][0]
except IndexError:
    FN = 0
try:
    TP = CM[1][1]
except IndexError:
    TP = 0
try:
    FP = CM[0][1]
except IndexError:
    FP = 0

请注意，“y”是真实情况，“y_hat”是预测。

Answer 18

这工作正常
来源 - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html

tn, fp, fn, tp = confusion_matrix(y_test, predicted).ravel()

Answer 19

虽然它与 scikit-learn 无关，但您还可以做的是

tp = sum(y_test & pred)
fp = sum(1-y_test & pred ) 
tn = sum(1-y_test & 1-pred)
fn = sum(y_test & 1-pred)

Answer 20

这是调用theshell错误代码的修复程序（当前显示为已接受的答案）：

def performance_measure(y_actual, y_hat):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for i in range(len(y_hat)): 
        if y_actual[i] == y_hat[i]==1:
            TP += 1
        if y_hat[i] == 1 and y_actual[i] == 0:
            FP += 1
        if y_hat[i] == y_actual[i] == 0:
            TN +=1
        if y_hat[i] == 0 and y_actual[i] == 1:
            FN +=1

    return(TP, FP, TN, FN)

Scikit-learn：如何获取True Positive、True Negative、False Positive和False Negative

问题描述

20 个解决方案

解决方案1
150 2017-04-10 19:28:13

解决方案2
53 已采纳 2015-07-10 22:14:26

解决方案3
45 2016-10-29 22:09:28

解决方案4
24 2015-07-09 17:50:02

解决方案5
16 2018-12-20 13:13:37

解决方案6
6 2018-01-31 14:40:48

解决方案7
5 2018-02-08 18:29:16

解决方案8
5 2019-09-16 08:10:30

解决方案9
4 2018-05-23 09:43:23

解决方案10
4 2019-12-24 10:46:13

解决方案11
2 2017-02-16 19:06:31

解决方案12
1 2016-09-12 15:11:01

解决方案13
1 2020-10-20 04:31:20

解决方案14
1 2021-05-06 15:05:20

解决方案15
0 2019-07-16 05:28:17

解决方案16
0 2020-10-20 04:51:43

解决方案17
0 2021-06-14 12:24:18

解决方案18
0 2021-08-31 10:39:58

解决方案19
0 2022-03-15 08:56:12

解决方案20
-2 2018-02-05 18:17:03

Scikit-learn：如何获取True Positive、True Negative、False Positive和False Negative

问题描述

20 个解决方案

解决方案1 150 2017-04-10 19:28:13

解决方案2 53 已采纳 2015-07-10 22:14:26

解决方案3 45 2016-10-29 22:09:28

解决方案4 24 2015-07-09 17:50:02

解决方案5 16 2018-12-20 13:13:37

解决方案6 6 2018-01-31 14:40:48

解决方案7 5 2018-02-08 18:29:16

解决方案8 5 2019-09-16 08:10:30

解决方案9 4 2018-05-23 09:43:23

解决方案10 4 2019-12-24 10:46:13

解决方案11 2 2017-02-16 19:06:31

解决方案12 1 2016-09-12 15:11:01

解决方案13 1 2020-10-20 04:31:20

解决方案14 1 2021-05-06 15:05:20

解决方案15 0 2019-07-16 05:28:17

解决方案16 0 2020-10-20 04:51:43

解决方案17 0 2021-06-14 12:24:18

解决方案18 0 2021-08-31 10:39:58

解决方案19 0 2022-03-15 08:56:12

解决方案20 -2 2018-02-05 18:17:03

解决方案1
150 2017-04-10 19:28:13

解决方案2
53 已采纳 2015-07-10 22:14:26

解决方案3
45 2016-10-29 22:09:28

解决方案4
24 2015-07-09 17:50:02

解决方案5
16 2018-12-20 13:13:37

解决方案6
6 2018-01-31 14:40:48

解决方案7
5 2018-02-08 18:29:16

解决方案8
5 2019-09-16 08:10:30

解决方案9
4 2018-05-23 09:43:23

解决方案10
4 2019-12-24 10:46:13

解决方案11
2 2017-02-16 19:06:31

解决方案12
1 2016-09-12 15:11:01

解决方案13
1 2020-10-20 04:31:20

解决方案14
1 2021-05-06 15:05:20

解决方案15
0 2019-07-16 05:28:17

解决方案16
0 2020-10-20 04:51:43

解决方案17
0 2021-06-14 12:24:18

解决方案18
0 2021-08-31 10:39:58

解决方案19
0 2022-03-15 08:56:12

解决方案20
-2 2018-02-05 18:17:03