简体   繁体   English

y_test 和 y_score 之间的 roc_auc_score 不匹配

[英]roc_auc_score mismatch between y_test and y_score

I'm trying to calculate the following:我正在尝试计算以下内容:

auc = roc_auc_score(gt, pr, multi_class="ovr")

where gt is a list sized 3470208 containing values between 0 and 41 (all int) and pr is a list sized 3470208 (same size) of lists that each is sized 42 with probabilities in each location that sum up to 1.其中gt是一个大小为 3470208的列表,包含 0 到 41 之间的值(全部为 int), pr是一个大小为 3470208 (相同大小)的列表,每个列表的大小为 42,每个位置的概率总和为 1。

However, I'm getting the following error:但是,我收到以下错误:

ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'

So I am kind of lost as the number of classes in y_true (gt) is 42 because I have a list of integers from 0 to 41.所以我有点迷路,因为y_true (gt)中的类数是 42,因为我有一个从 0 到 41 的整数列表。

and since pr is a list of lists of size 42 then I think it should work.因为pr是一个大小为 42 的列表列表,所以我认为它应该可以工作。

Help will be appreciated!帮助将不胜感激!

Make sure that all integers between 0 and 41 (inclusive) exist in gt.确保 gt 中存在 0 到 41(含)之间的所有整数。

A simple example:一个简单的例子:

import numpy as np
from sklearn.metrics import roc_auc_score

# results in error:
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
#roc_auc_score(gt1, pr1, multi_class='ovr')


# does not result in error:
gt2 = np.array([0,2,1,3])
pr2 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3],
     [0.3, 0.3, 0.2, 0.2]] 
)
#roc_auc_score(gt2, pr2, multi_class='ovr')

Because integer/label 2 is non-existent in gt1 it throws an error.因为整数/标签 2 在 gt1 中不存在,所以会引发错误。 In other words, the number of classes in gt1 (3) is not equal to the number of columns in pr1 (4).换句话说, gt1(3)中的类数不等于pr1(4)中的列数

The roc_auc_score method has a labels parameter that can be used to specify missing labels. roc_auc_score方法有一个标签参数,可用于指定缺失的标签。

Unfortunately, this only works with the multi_class="ovo" mode, and not the "ovr" mode.不幸的是,这只适用于 multi_class="ovo" 模式,而不适用于 "ovr" 模式。

# without labels
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
roc_auc_score(gt1, pr1, multi_class='ovo')
> ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'

# with labels and multi-class="ovo":
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
roc_auc_score(gt1, pr1, multi_class='ovo', labels=[0, 1, 2, 3])
> 0.5

# with labels and multi-class="ovr":
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
roc_auc_score(gt1, pr1, multi_class='ovr', labels=[0, 1, 2, 3])
> ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

In this case, there is only one class present in y_true because the roc_auc_score function iterates over each class (identified as class A) and compares them with the other classes (identified as class B). In this case, there is only one class present in y_true because the roc_auc_score function iterates over each class (identified as class A) and compares them with the other classes (identified as class B). For class 2, the y_true array is equal to [B, B, B] so there is only one class and the ROC AUC score cannot be calculated.对于 class 2,y_true 数组等于 [B, B, B] 因此只有一个 class 并且无法计算 ROC AUC 分数。

Get the unique values获取唯一值

unique_values = np.unique(predictions) unique_values = np.unique(预测)

Get the unique values as a list获取唯一值作为列表

classLabels = unique_values.tolist() classLabels = unique_values.tolist()

Find the roc auc Score找到 roc auc 分数

rocAucScore=roc_auc_score(gtruth, pred_prob, multi_class='ovo',labels=classLabels) rocAucScore=roc_auc_score(gtruth, pred_prob, multi_class='ovo',labels=classLabels)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM