![](/img/trans.png)
[英]how to handle ValueError: Classification metrics can't handle a mix of multilabel-indicator and multiclass targets error
[英]ValueError: Classification metrics can't handle a mix of multiclass and multilabel-indicator targets in ROC curve calculation
我正在尝试为多类分类绘制 roc 曲线。
首先,我使用以下代码计算y_pred
和y_proba
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state = 0)
# training a DescisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier
dtree_model = DecisionTreeClassifier(max_depth = 2).fit(X_train, y_train)
y_pred = dtree_model.predict(X_test)
y_proba= dtree_model.predict_proba(X_test)
之后我使用下面的tpr
来计算 tpr 和fpr
from sklearn.metrics import confusion_matrix
def calculate_tpr_fpr(y_test, y_pred):
'''
Calculates the True Positive Rate (tpr) and the True Negative Rate (fpr) based on real and predicted observations
Args:
y_real: The list or series with the real classes
y_pred: The list or series with the predicted classes
Returns:
tpr: The True Positive Rate of the classifier
fpr: The False Positive Rate of the classifier
'''
# Calculates the confusion matrix and recover each element
cm = confusion_matrix(y_test, y_pred)
TN = cm[0, 0]
FP = cm[0, 1]
FN = cm[1, 0]
TP = cm[1, 1]
# Calculates tpr and fpr
tpr = TP / (TP + FN) # sensitivity - true positive rate
fpr = 1 - TN / (TN + FP) # 1-specificity - false positive rate
return tpr, fpr
然后,我尝试使用这个tpr
来计算fpr
和 tpr 的列表来绘制曲线
def get_all_roc_coordinates(y_test, y_proba):
'''
Calculates all the ROC Curve coordinates (tpr and fpr) by considering each point as a treshold for the predicion of the class.
Args:
y_test: The list or series with the real classes.
y_proba: The array with the probabilities for each class, obtained by using the `.predict_proba()` method.
Returns:
tpr_list: The list of TPRs representing each threshold.
fpr_list: The list of FPRs representing each threshold.
'''
tpr_list = [0]
fpr_list = [0]
for i in range(len(y_proba)):
threshold = y_proba[i]
y_pred = y_proba = threshold
tpr, fpr = calculate_tpr_fpr(y_test, y_pred)
tpr_list.append(tpr)
fpr_list.append(fpr)
return tpr_list, fpr_list
但它给了我以下错误
ValueError: Classification metrics can't handle a mix of multiclass and multilabel-indicator targets
请注意,Y 列是多类 {0,1,2}。 我还尝试确保 y 不是 integer 的字符串,但它给了我同样的错误。
您有 3 个类,但在calculate_tpr_fpr()
中只使用 2 个类。 另外,您的意思可能是y_pred = y_proba > threshold
。 无论哪种方式,它都不会那么容易,因为你有 3 列 class 分数。 最简单的方法似乎是绘制一条与 rest 曲线,分别处理每一列:
from sklearn.metrics import roc_curve
from sklearn.preprocessing import label_binarize
import matplotlib.pyplot as plt
classes = range(y_proba.shape[1])
for i in classes:
fpr, tpr, _ = roc_curve(label_binarize(y_test, classes=classes)[:,i], y_proba[:,i])
plt.plot(fpr, tpr, alpha=0.7)
plt.legend(classes)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.