简体   繁体   English

Scikit-learn:如何获取True Positive、True Negative、False Positive和False Negative

[英]Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative

My problem:我的问题:

I have a dataset which is a large JSON file.我有一个数据集,它是一个很大的 JSON 文件。 I read it and store it in the trainList variable.我读取它并将其存储在trainList变量中。

Next, I pre-process it - in order to be able to work with it.接下来,我对其进行预处理——以便能够使用它。

Once I have done that I start the classification:完成后,我开始分类:

  1. I use the kfold cross validation method in order to obtain the mean accuracy and train a classifier.我使用kfold交叉验证方法来获得平均准确度并训练分类器。
  2. I make the predictions and obtain the accuracy & confusion matrix of that fold.我做出预测并获得该折叠的准确性和混淆矩阵。
  3. After this, I would like to obtain the True Positive(TP) , True Negative(TN) , False Positive(FP) and False Negative(FN) values.在此之后,我想获得True Positive(TP)True Negative(TN)False Positive(FP)False Negative(FN)值。 I'll use these parameters to obtain the Sensitivity and Specificity .我将使用这些参数来获得SensitivitySpecificity

Finally, I would use this to put in HTML in order to show a chart with the TPs of each label.最后,我会用它来输入 HTML,以显示包含每个 label 的 TP 的图表。

Code:代码:

The variables I have for the moment:我目前拥有的变量:

trainList #It is a list with all the data of my dataset in JSON form
labelList #It is a list with all the labels of my data 

Most part of the method:大部分方法:

#I transform the data from JSON form to a numerical one
X=vec.fit_transform(trainList)

#I scale the matrix (don't know why but without it, it makes an error)
X=preprocessing.scale(X.toarray())

#I generate a KFold in order to make cross validation
kf = KFold(len(X), n_folds=10, indices=True, shuffle=True, random_state=1)

#I start the cross validation
for train_indices, test_indices in kf:
    X_train=[X[ii] for ii in train_indices]
    X_test=[X[ii] for ii in test_indices]
    y_train=[listaLabels[ii] for ii in train_indices]
    y_test=[listaLabels[ii] for ii in test_indices]

    #I train the classifier
    trained=qda.fit(X_train,y_train)

    #I make the predictions
    predicted=qda.predict(X_test)

    #I obtain the accuracy of this fold
    ac=accuracy_score(predicted,y_test)

    #I obtain the confusion matrix
    cm=confusion_matrix(y_test, predicted)

    #I should calculate the TP,TN, FP and FN 
    #I don't know how to continue

For the multi-class case, everything you need can be found from the confusion matrix.对于多类情况,您需要的一切都可以从混淆矩阵中找到。 For example, if your confusion matrix looks like this:例如,如果您的混淆矩阵如下所示:

混淆矩阵

Then what you're looking for, per class, can be found like this:然后,您可以在每个班级中找到您要查找的内容,如下所示:

覆盖

Using pandas/numpy, you can do this for all classes at once like so:使用 pandas/numpy,您可以一次为所有类执行此操作,如下所示:

FP = confusion_matrix.sum(axis=0) - np.diag(confusion_matrix)  
FN = confusion_matrix.sum(axis=1) - np.diag(confusion_matrix)
TP = np.diag(confusion_matrix)
TN = confusion_matrix.values.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)

# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)

If you have two lists that have the predicted and actual values;如果您有两个具有预测值和实际值的列表; as it appears you do, you can pass them to a function that will calculate TP, FP, TN, FN with something like this:正如您所做的那样,您可以将它们传递给一个函数,该函数将使用以下内容计算 TP、FP、TN、FN:

def perf_measure(y_actual, y_hat):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for i in range(len(y_hat)): 
        if y_actual[i]==y_hat[i]==1:
           TP += 1
        if y_hat[i]==1 and y_actual[i]!=y_hat[i]:
           FP += 1
        if y_actual[i]==y_hat[i]==0:
           TN += 1
        if y_hat[i]==0 and y_actual[i]!=y_hat[i]:
           FN += 1

    return(TP, FP, TN, FN)

From here I think you will be able to calculate rates of interest to you, and other performance measure like specificity and sensitivity.从这里我认为您将能够计算出您的利率,以及其他性能指标,如特异性和敏感性。

According to scikit-learn documentation,根据 scikit-learn 文档,

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

By definition a confusion matrix C is such that C[i, j] is equal to the number of observations known to be in group i but predicted to be in group j .根据定义,混淆矩阵 C 使得C[i, j]等于已知在组i但预测在组j的观察数。

Thus in binary classification, the count of true negatives is C[0,0] , false negatives is C[1,0] , true positives is C[1,1] and false positives is C[0,1] .因此,在二元分类中,真负数为C[0,0] ,假负数为C[1,0] ,真正数为C[1,1] ,假正数为C[0,1]

CM = confusion_matrix(y_true, y_pred)

TN = CM[0][0]
FN = CM[1][0]
TP = CM[1][1]
FP = CM[0][1]

You can obtain all of the parameters from the confusion matrix.您可以从混淆矩阵中获取所有参数。 The structure of the confusion matrix(which is 2X2 matrix) is as follows (assuming the first index is related to the positive label, and the rows are related to the true labels):混淆矩阵(2X2矩阵)的结构如下(假设第一个索引与正标签相关,行与真实标签相关):

TP|FN
FP|TN

So所以

TP = cm[0][0]
FN = cm[0][1]
FP = cm[1][0]
TN = cm[1][1]

More details at https://en.wikipedia.org/wiki/Confusion_matrix更多详情请访问https://en.wikipedia.org/wiki/Confusion_matrix

The one liner to get true postives etc. out of the confusion matrix is to ravel it:从混淆矩阵中获得真阳性等的一个班轮是解开它:

from sklearn.metrics import confusion_matrix

y_true = [1, 1, 0, 0]
y_pred = [1, 0, 1, 0]   

tn, fp, fn, tp = confusion_matrix(y_true, y_pred, labels=[0, 1]).ravel()
print(tn, fp, fn, tp)  # 1 1 1 1

One should set the labels parameter in case the data contains only a single case, eg only true positives.如果数据仅包含单个案例,例如仅真阳性,则应设置labels参数。 Setting labels correctly ensures that the confusion matrix has a 2x2 shape.正确设置labels可确保混淆矩阵具有 2x2 形状。

In the scikit-learn 'metrics' library there is a confusion_matrix method which gives you the desired output.在 scikit-learn 'metrics' 库中,有一个 Confusion_matrix 方法可以为您提供所需的输出。

You can use any classifier that you want.您可以使用任何您想要的分类器。 Here I used the KNeighbors as example.这里我以 KNeighbors 为例。

from sklearn import metrics, neighbors

clf = neighbors.KNeighborsClassifier()

X_test = ...
y_test = ...

expected = y_test
predicted = clf.predict(X_test)

conf_matrix = metrics.confusion_matrix(expected, predicted)

>>> print conf_matrix
>>>  [[1403   87]
     [  56 3159]]

The docs: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix文档: http : //scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

I wrote a version that works using only numpy.我写了一个只使用 numpy 的版本。 I hope it helps you.我希望它能帮助你。

import numpy as np

def perf_metrics_2X2(yobs, yhat):
    """
    Returns the specificity, sensitivity, positive predictive value, and 
    negative predictive value 
    of a 2X2 table.

    where:
    0 = negative case
    1 = positive case

    Parameters
    ----------
    yobs :  array of positive and negative ``observed`` cases
    yhat : array of positive and negative ``predicted`` cases

    Returns
    -------
    sensitivity  = TP / (TP+FN)
    specificity  = TN / (TN+FP)
    pos_pred_val = TP/ (TP+FP)
    neg_pred_val = TN/ (TN+FN)

    Author: Julio Cardenas-Rodriguez
    """
    TP = np.sum(  yobs[yobs==1] == yhat[yobs==1] )
    TN = np.sum(  yobs[yobs==0] == yhat[yobs==0] )
    FP = np.sum(  yobs[yobs==1] == yhat[yobs==0] )
    FN = np.sum(  yobs[yobs==0] == yhat[yobs==1] )

    sensitivity  = TP / (TP+FN)
    specificity  = TN / (TN+FP)
    pos_pred_val = TP/ (TP+FP)
    neg_pred_val = TN/ (TN+FN)

    return sensitivity, specificity, pos_pred_val, neg_pred_val

Just in case some is looking for the same in MULTI-CLASS Example以防万一有人在MULTI-CLASS Example 中寻找相同的东西

def perf_measure(y_actual, y_pred):
    class_id = set(y_actual).union(set(y_pred))
    TP = []
    FP = []
    TN = []
    FN = []

    for index ,_id in enumerate(class_id):
        TP.append(0)
        FP.append(0)
        TN.append(0)
        FN.append(0)
        for i in range(len(y_pred)):
            if y_actual[i] == y_pred[i] == _id:
                TP[index] += 1
            if y_pred[i] == _id and y_actual[i] != y_pred[i]:
                FP[index] += 1
            if y_actual[i] == y_pred[i] != _id:
                TN[index] += 1
            if y_pred[i] != _id and y_actual[i] != y_pred[i]:
                FN[index] += 1


    return class_id,TP, FP, TN, FN

you can try sklearn.metrics.classification_report as below:您可以尝试sklearn.metrics.classification_report如下:

import sklearn
y_true = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]
y_pred = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]

print sklearn.metrics.classification_report(y_true, y_pred)

output:输出:

         precision    recall  f1-score   support

      0       0.80      0.57      0.67         7
      1       0.50      0.75      0.60         4

      avg / total       0.69      0.64      0.64        11

In scikit version 0.22, you can do it like this在 scikit 0.22 版本中,你可以这样做

from sklearn.metrics import multilabel_confusion_matrix

y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]

mcm = multilabel_confusion_matrix(y_true, y_pred,labels=["ant", "bird", "cat"])

tn = mcm[:, 0, 0]
tp = mcm[:, 1, 1]
fn = mcm[:, 1, 0]
fp = mcm[:, 0, 1]

if you have more than one classes in your classifier, you might want to use pandas-ml at that part.如果您的分类器中有多个类,您可能希望在该部分使用 pandas-ml。 Confusion Matrix of pandas-ml give more detailed information. pandas-ml 的混淆矩阵提供了更详细的信息。 check that 检查一下

结果

I think both of the answers are not fully correct.我认为这两个答案都不完全正确。 For example, suppose that we have the following arrays;例如,假设我们有以下数组;
y_actual = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0] y_actual = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]

y_predic = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0] y_predic = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]

If we compute the FP, FN, TP and TN values manually, they should be as follows:如果我们手动计算 FP、FN、TP 和 TN 值,它们应该如下:

FP: 3 FN: 1 TP: 3 TN: 4 FP:3 FN:1 TP:3 TN:4

However, if we use the first answer, results are given as follows:但是,如果我们使用第一个答案,结果如下:

FP: 1 FN: 3 TP: 3 TN: 4 FP:1 FN:3 TP:3 TN:4

They are not correct, because in the first answer, False Positive should be where actual is 0, but the predicted is 1, not the opposite.它们不正确,因为在第一个答案中,False Positive 应该是实际为 0,但预测为 1,而不是相反。 It is also same for False Negative.对于假阴性也是一样的。

And, if we use the second answer, the results are computed as follows:而且,如果我们使用第二个答案,结果计算如下:

FP: 3 FN: 1 TP: 4 TN: 3 FP:3 FN:1 TP:4 TN:3

True Positive and True Negative numbers are not correct, they should be opposite.真正数和真负数是不正确的,它们应该是相反的。

Am I correct with my computations?我的计算正确吗? Please let me know if I am missing something.如果我遗漏了什么,请告诉我。

#False positive cases
train = pd.merge(X_train, y_train,left_index=True, right_index=True)
y_train_pred = pd.DataFrame(y_train_pred)
y_train_pred.rename(columns={0 :'Predicted'}, inplace=True )
train = train.reset_index(drop=True).merge(y_train_pred.reset_index(drop=True),
left_index=True,right_index=True)
train['FP'] = np.where((train['Banknote']=="Forged") & (train['Predicted']=="Genuine"),1,0)
train[train.FP != 0]
def getTPFPTNFN(y_true, y_pred):
    TP, FP, TN, FN = 0, 0, 0, 0
    for s_true, s_pred in zip (y_true, y_pred):
        if s_true == 1:
            if s_pred == 1: 
                TP += 1
            else:
                FN += 1
        else:
            if s_pred == 0:
                TN += 1
            else:
                FP += 1
    return TP, FP, TN, FN

I have tried some of the answers and found them not working.我尝试了一些答案,发现它们不起作用。

This works for me:这对我有用:

from sklearn.metrics import classification_report

print(classification_report(y_test, predicted)) 

#FalseNegatives #假阴性

test = pd.merge(Variables_test, Banknote_test,left_index=True, right_index=True)
Banknote_test_pred = pd.DataFrame(banknote_test_pred)
Banknote_test_pred.rename(columns={0 :'Predicted'}, inplace=True )
test = test.reset_index(drop=True).merge(Banknote_test_pred.reset_index(drop=True), left_index=True, right_index=True)
test['FN'] = np.where((test['Banknote']=="Genuine") & (test['Predicted']=="Forged"),1,0)
test[test.FN != 0]

None of the answers given so far worked for me as I sometimes ended up having a confusion matrix with a single entry only.到目前为止给出的答案都没有对我有用,因为我有时最终会得到一个只有一个条目的混淆矩阵。 The following code is able to mitigate this issue:以下代码能够缓解此问题:

from sklearn.metrics import confusion_matrix
CM = confusion_matrix(y, y_hat)
            
try:
    TN = CM[0][0]
except IndexError:
    TN = 0
try:
    FN = CM[1][0]
except IndexError:
    FN = 0
try:
    TP = CM[1][1]
except IndexError:
    TP = 0
try:
    FP = CM[0][1]
except IndexError:
    FP = 0

Please note that "y" is the groundtruth and "y_hat" is the prediction.请注意,“y”是真实情况,“y_hat”是预测。

Although it does not relate to scikit-learn, what you could also do is虽然它与 scikit-learn 无关,但您还可以做的是

tp = sum(y_test & pred)
fp = sum(1-y_test & pred ) 
tn = sum(1-y_test & 1-pred)
fn = sum(y_test & 1-pred)

Here's a fix to invoketheshell's buggy code (which currently appears as the accepted answer):这是调用theshell错误代码的修复程序(当前显示为已接受的答案):

def performance_measure(y_actual, y_hat):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for i in range(len(y_hat)): 
        if y_actual[i] == y_hat[i]==1:
            TP += 1
        if y_hat[i] == 1 and y_actual[i] == 0:
            FP += 1
        if y_hat[i] == y_actual[i] == 0:
            TN +=1
        if y_hat[i] == 0 and y_actual[i] == 1:
            FN +=1

    return(TP, FP, TN, FN)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 True positive, False positive, False Negative 计算数据帧 python - True positive, False positive, False Negative calculation data frame python Scikit-learn:如何计算真阴性 - Scikit-learn: How to calculate the True Negative 有没有办法用已知的真阳性、真阴性、假阳性和假阴性来绘制混淆矩阵? - Is there a way to draw confusion matrix with known True Positive, True Negative, False Positive and False Negative? 哪个是哪个? (真阳性、真阴性、假阳性、假阴性) - Which one is which ? (True Positive, True Negative, False Positive, False Negative) 二元分类器只进行真阴性和假阳性预测 - Binary classifier making only making true negative and false positive predictions 使用Scikit-Learn和Python将评论分为正面和负面 - Classifying comments into positive and negative using Scikit-Learn with Python 从混淆矩阵中获取假阴性、假阳性、真阳性和真阴性的相关数据集 - Getting relevant datasets of false negatives, false positives, true positive and true negative from confusion matrix 如何计算误报率 (FPR) 和误报率百分比? - How to compute false positive rate (FPR) and False negative rate percantage? filecmp.cmp()什么时候返回假阳性或假阴性? - When will filecmp.cmp() return a false positive or false negative? 自定义 TensorFlow 指标:给定假阳性率下的真阳性率 - Custom TensorFlow metric: true positive rate at given false positive rate
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM