R中多类分类的ROC曲线

Question

I have a dataset with 6 classes and I would like to plot a ROC curve for a multiclass classification.我有一个包含 6 个类的数据集，我想为多类分类绘制 ROC 曲线。 The first answer in this thread given by Achim Zeileis is a very good one. Achim Zeileis 在这个线程中给出的第一个答案是一个非常好的答案。

ROC curve in R using rpart package? 使用rpart包的R中的ROC曲线？

But this works only for a binomial classification.但这仅适用于二项式分类。 And the error i get is Error in prediction, Number of classes is not equal to 2 .我得到的Error in prediction, Number of classes is not equal to 2 。 Any one who has done this for a multi-class classification?有没有人为多类分类做过这个？

Here is a simple example of what I am trying to do.这是我正在尝试做的一个简单示例。 data <- read.csv("colors.csv")数据 <- read.csv("colors.csv")

let's say data$cType has 6 values (or levels) as ( red, green, blue, yellow, black and white )假设data$cType有6值（或级别）为（红色、绿色、蓝色、黄色、黑色和白色）

Is there anyway to plot a ROC curve for these 6 classes?有没有办法为这 6 个类绘制 ROC 曲线？ Any working example for a class of more than 2 would be appreciated.任何超过 2 个班级的工作示例将不胜感激。

Answer 1

Answering an old question while having the same requirement - I've found the scikit documentation explains a few approaches well.在具有相同要求的同时回答一个老问题 - 我发现 scikit 文档很好地解释了一些方法。

http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

The approaches mentioned include:提到的方法包括：

"binarizing" ie converting the problem to binary classification, using either macro-averaging or micro-averaging “二值化”，即使用宏观平均或微观平均将问题转换为二元分类
Draw multiple ROC curves, one per label绘制多条 ROC 曲线，每个标签一条
One vs. All一对一

Copying example from the above link, which illustrates one vs. all and micro averaging using their libs:复制上面链接中的示例，该示例说明了使用它们的库进行的一对多和微平均：

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle

from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from scipy import interp

# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Binarize the output
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]

# Add noisy features to make the problem harder
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]

# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
                                                    random_state=0)

# Learn to predict each class against the other
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True,
                                 random_state=random_state))
y_score = classifier.fit(X_train, y_train).decision_function(X_test)

# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])

I'm actually looking for a Javascript solution (using https://github.com/mljs/performance ) so I haven't implemented it with the above library, but it's been the most illuminating example I found so far.我实际上正在寻找一个 Javascript 解决方案（使用https://github.com/mljs/performance ），所以我没有用上面的库实现它，但它是我迄今为止发现的最有启发性的例子。

Answer 2

I know this is an old question, but the fact that the only answer is written using Python bothers me a lot, given that the question specifically asks for an R solution.我知道这是一个老问题，但鉴于该问题专门要求 R 解决方案，因此唯一的答案是使用 Python 编写的这一事实让我很困扰。

As you can see from the code below, I am using pROC::multiclass.roc() function.从下面的代码中可以看出，我正在使用pROC::multiclass.roc()函数。 The only requirement to make it work is that the names of the columns of the predictions matrix match the true classes ( real_values ).使其工作的唯一要求是预测矩阵的列的名称与真实类（ real_values ）匹配。

The first example generates random predictions.第一个示例生成随机预测。 The second one generates a better prediction.第二个产生更好的预测。 The third one generates the perfect prediction (ie, always assigning the highest probability to the true class.)第三个生成完美的预测（即，始终将最高概率分配给真实类别。）

library(pROC)
set.seed(42)
head(real_values)
real_values <- matrix( c("class1", "class2", "class3"), nc=1 )

# [,1]    
# [1,] "class1"
# [2,] "class2"
# [3,] "class3"

# Random predictions
random_preds <- matrix(rbeta(3*3,2,2), nc=3)
random_preds <- sweep(random_preds, 1, rowSums(a1), FUN="/")
colnames(random_preds) <- c("class1", "class2", "class3")


head(random_preds)

#       class1    class2    class3
# [1,] 0.3437916 0.6129104 0.4733117
# [2,] 0.6016169 0.4700832 0.9364681
# [3,] 0.6741742 0.8677781 0.4823129

multiclass.roc(real_values, random_preds)
#Multi-class area under the curve: 0.1667



better_preds <- matrix(c(0.75,0.15,0.5,
                         0.15,0.5,0.75,
                         0.15,0.75,0.5), nc=3)
colnames(better_preds) <- c("class1", "class2", "class3")

head(better_preds)

#       class1 class2 class3
# [1,]   0.75   0.15   0.15
# [2,]   0.15   0.50   0.75
# [3,]   0.50   0.75   0.50

multiclass.roc(real_values, better_preds)
#Multi-class area under the curve: 0.6667


perfect_preds <- matrix(c(0.75,0.15,0.5,
                          0.15,0.75,0.5,
                          0.15,0.5,0.75), nc=3)
colnames(perfect_preds) <- c("class1", "class2", "class3")
head(perfect_preds)

multiclass.roc(real_values, perfect_preds)
#Multi-class area under the curve: 1

R中多类分类的ROC曲线

问题描述

2 个解决方案

解决方案1
0 2017-09-11 09:00:29

解决方案2
0 2020-12-14 11:09:22

R中多类分类的ROC曲线

问题描述

2 个解决方案

解决方案1 0 2017-09-11 09:00:29

解决方案2 0 2020-12-14 11:09:22

解决方案1
0 2017-09-11 09:00:29

解决方案2
0 2020-12-14 11:09:22