简体   繁体   English

多级分类的ROC曲线,在python中没有一个对所有

[英]ROC curve for multi-class classification without one vs all in python

I have a multi-class classification problem with 9 different classes. 我有一个包含9个不同类的多类分类问题。 I am using the AdaBoostClassifier class from scikit-learn to train my model without using the one vs all technique, as the number of classes is very high and it might be inefficient. 我正在使用来自scikit-learn的AdaBoostClassifier类来训练我的模型而不使用one vs all技术,因为类的数量非常高并且可能效率低下。

I have tried using the tips from the documentation in scikit learn [1], but there the one vs all technique is used, which is substantially different. 我尝试过使用scikit learn [1]中的文档提示,但是使用了one vs all技术,这是完全不同的。 In my approach I only get one prediction per event, ie if I have n classes, the outcome of the prediction is a single value within the n classes. 在我的方法中,我每个事件只得到一个预测,即如果我有n个类,预测的结果是n个类中的单个值。 For the one vs all approach, on the other hand, the outcome of the prediction is an array of size n with a sort of likelihood value per class. 另一方面,对于一对一的方法,预测的结果是大小为n的数组,其具有每类的一类似然值。

[1] https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py [1] https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py

The code is: 代码是:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt # Matplotlib plotting library for basic visualisation
%matplotlib inline

from sklearn.model_selection import train_test_split 
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_curve, auc
from sklearn import preprocessing


# Read data
df = pd.read_pickle('data.pkl')

# Create the dependent variable class
# This will substitute each of the n classes from 
# text to number
factor = pd.factorize(df['target_var'])
df.target_var= factor[0]
definitions = factor[1]

X = df.drop('target_var', axis=1)
y = df['target_var]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

bdt_clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=2),
    n_estimators=250,
    learning_rate=0.3)

bdt_clf.fit(X_train, y_train)

y_pred = bdt_clf.predict(X_test)

#Reverse factorize (converting y_pred from 0s,1s, 2s, etc. to their original values
reversefactor = dict(zip(range(9),definitions))
y_test_rev = np.vectorize(reversefactor.get)(y_test)
y_pred_rev = np.vectorize(reversefactor.get)(y_pred)

I tried directly with the roc curve function, and also binarising the labels, but I always get the same error message. 我直接尝试使用roc曲线函数,并对标签进行二值化,但我总是得到相同的错误信息。

def multiclass_roc_auc(y_test, y_pred):
    lb = preprocessing.LabelBinarizer()
    lb.fit(y_test)
    y_test = lb.transform(y_test)
    y_pred = lb.transform(y_pred)
    return roc_curve(y_test, y_pred)

multiclass_roc_auc(y_test, y_pred_test)

The error message is: 错误消息是:

ValueError: multilabel-indicator format is not supported ValueError:不支持multilabel-indicator格式

How could this be sorted out? 怎么可以解决这个问题? Am I missing some important concept? 我错过了一些重要的概念吗?

An ROC (receiver operating characteristic curve) is formed from a plot of true positives and false positives from a binary classifier. ROC(接收器操作特性曲线)由来自二元分类器的真阳性和假阳性的图形成。

The area under the curve gives an indication of accuracy for a binary classifier. 曲线下面的区域给出了二元分类器的准确性指示。

For multiclass problems, you can find the accuracy, but this can be misleading if your data tends to fall into the categories non-uniformly. 对于多类问题,您可以找到准确性,但如果您的数据倾向于非均匀地落入类别,则可能会产生误导。 Appropriate sampling can overcome this. 适当的采样可以克服这一点。

The AdaBoostClassifier you are using will give you a score , showing the mean accuracy. 您使用的AdaBoostClassifier将为您提供score ,显示平均准确度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Plot DecisionTreeClassifier 的多类 ROC 曲线 - Plot multi-class ROC curve for DecisionTreeClassifier Plot ROC 曲线与 sklearn 用于硬多类预测 - Plot ROC curve with sklearn for hard multi-class predictions xgboost (python) 中的多类分类 - Multi-class classification in xgboost (python) scikit-learn是否在多类别分类中默认使用One-Vs-Rest? - Does scikit-learn uses One-Vs-Rest by default in multi-class classification? 从 scikit-learn (sklearn) 中的多类数据计算 AUC 和 ROC 曲线? - Computing AUC and ROC curve from multi-class data in scikit-learn (sklearn)? BertForSequenceClassification 与用于句子多类分类的 BertForMultipleChoice - BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification 多类分类图? - Plots for multi-class classification? 多类别分类找到所有类别的概率 - Multi-class classification find probability of all classes python - 为不同的多类分类器绘制精度召回曲线 - python - Plot Precision Recall Curve for different multi-class classifiers 将代码从二元分类器逻辑回归修改为多类“one vs all”逻辑回归 - Modifying code from binary classifier logistic regression to multi-class “one vs all” logistic regression
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM