y_test 和 y_score 之間的 roc_auc_score 不匹配

Question

我正在嘗試計算以下內容：

auc = roc_auc_score(gt, pr, multi_class="ovr")

其中gt是一個大小為 3470208的列表，包含 0 到 41 之間的值（全部為 int）， pr是一個大小為 3470208 （相同大小）的列表，每個列表的大小為 42，每個位置的概率總和為 1。

但是，我收到以下錯誤：

ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'

所以我有點迷路，因為y_true (gt)中的類數是 42，因為我有一個從 0 到 41 的整數列表。

因為pr是一個大小為 42 的列表列表，所以我認為它應該可以工作。

幫助將不勝感激！

Answer 1

確保 gt 中存在 0 到 41（含）之間的所有整數。

一個簡單的例子：

import numpy as np
from sklearn.metrics import roc_auc_score

# results in error:
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
#roc_auc_score(gt1, pr1, multi_class='ovr')


# does not result in error:
gt2 = np.array([0,2,1,3])
pr2 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3],
     [0.3, 0.3, 0.2, 0.2]] 
)
#roc_auc_score(gt2, pr2, multi_class='ovr')

因為整數/標簽 2 在 gt1 中不存在，所以會引發錯誤。 換句話說， gt1(3)中的類數不等於pr1(4)中的列數。

Answer 2

roc_auc_score方法有一個標簽參數，可用於指定缺失的標簽。

不幸的是，這只適用於 multi_class="ovo" 模式，而不適用於 "ovr" 模式。

# without labels
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
roc_auc_score(gt1, pr1, multi_class='ovo')
> ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'

# with labels and multi-class="ovo":
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
roc_auc_score(gt1, pr1, multi_class='ovo', labels=[0, 1, 2, 3])
> 0.5

# with labels and multi-class="ovr":
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
roc_auc_score(gt1, pr1, multi_class='ovr', labels=[0, 1, 2, 3])
> ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

In this case, there is only one class present in y_true because the roc_auc_score function iterates over each class (identified as class A) and compares them with the other classes (identified as class B). 對於 class 2，y_true 數組等於 [B, B, B] 因此只有一個 class 並且無法計算 ROC AUC 分數。

Answer 3

獲取唯一值

unique_values = np.unique(預測)

獲取唯一值作為列表

classLabels = unique_values.tolist()

找到 roc auc 分數

rocAucScore=roc_auc_score(gtruth, pred_prob, multi_class='ovo',labels=classLabels)

y_test 和 y_score 之間的 roc_auc_score 不匹配

問題描述

3 個解決方案

解決方案1
2 已采納 2020-07-28 19:50:49

解決方案2
1 2021-07-30 21:44:11

解決方案3
0 2023-01-24 20:11:22

獲取唯一值

獲取唯一值作為列表

找到 roc auc 分數

y_test 和 y_score 之間的 roc_auc_score 不匹配

問題描述

3 個解決方案

解決方案1 2 已采納 2020-07-28 19:50:49

解決方案2 1 2021-07-30 21:44:11

解決方案3 0 2023-01-24 20:11:22

獲取唯一值

獲取唯一值作為列表

找到 roc auc 分數

解決方案1
2 已采納 2020-07-28 19:50:49

解決方案2
1 2021-07-30 21:44:11

解決方案3
0 2023-01-24 20:11:22