使用R中的pROC的随机森林拟合对象的ROC曲线，以使用正数或负数“投票”作为预测变量

Question

Obese is a binary response var with 1 indicating obese and 0 not obese. 肥胖是二元反应变量，其中1表示肥胖，0表示不肥胖。 Weight is a continuous predictor. 体重是一个连续的预测指标。

using a RF to classify obese: 使用RF对肥胖进行分类：

library(randomFores)

rf <- randomForest(factor(obese)~weight)

gives us a fit object containing: 给我们一个适合的对象，其中包含：

> summary(rf)
                Length Class  Mode     
call               2   -none- call     
type               1   -none- character
predicted        100   factor numeric  
err.rate        1500   -none- numeric  
confusion          6   -none- numeric  
votes            200   matrix numeric  
oob.times        100   -none- numeric  
classes            2   -none- character
importance         1   -none- numeric  
importanceSD       0   -none- NULL     
localImportance    0   -none- NULL     
proximity          0   -none- NULL     
ntree              1   -none- numeric  
mtry               1   -none- numeric  
forest            14   -none- list     
y                100   factor numeric  
test               0   -none- NULL     
inbag              0   -none- NULL     
terms              3   terms  call

I believe the votes matrix shows how many votes, from 0 to 1, the rF gives to classifying each case to either class; 我相信投票矩阵可以显示从0到1的多少票，rF可以将每种情况分类到任一类别。 not obese = 0, obese = 1: 不肥胖= 0，肥胖= 1：

> head(rf$votes, 20) 
           0          1
1  0.9318182 0.06818182
2  0.9325843 0.06741573
3  0.2784091 0.72159091
4  0.9040404 0.09595960
5  0.3865979 0.61340206
6  0.9689119 0.03108808
7  0.8187135 0.18128655
8  0.7170732 0.28292683
9  0.6931217 0.30687831
10 0.9831461 0.01685393
11 0.3425414 0.65745856
12 1.0000000 0.00000000
13 0.9728261 0.02717391
14 0.9848485 0.01515152
15 0.8783069 0.12169312
16 0.8553459 0.14465409
17 1.0000000 0.00000000
18 0.3389831 0.66101695
19 0.9316770 0.06832298
20 0.9435897 0.05641026

taking those: 采取那些：

votes_2 <- rf$votes[,2]
votes_1 <- rf$votes[,1]

my question is why do: 我的问题是为什么：

pROC::plot.roc(obese, votes_1)

and 和

pROC::plot.roc(obese, votes_2)

produce the same result. 产生相同的结果。

Answer 1

The first thing to realize is that ROC analysis doesn't care about the exact values of your data. 首先要意识到的是，ROC分析并不关心数据的确切值。 Instead it looks at the ranking on the data points, and how the ranks separate. 相反，它查看数据点的排名以及排名如何分离。

Second, as has been mentioned in a comment above, the votes for classes 0 and 1 sum up to 1 in each observation. 其次，正如上面的评论中提到的，在每个观察中，类别0和1的投票总计为1。 This means that in terms of ranking, the two are equivalent (modulo the direction of sorting). 这意味着就排名而言，两者是等效的（对排序方向取模）。

The last piece of the puzzle is that pROC doesn't assume that you are providing the predictor as the probability to belong to the positive class. 最后一个难题是，pROC不会假设您提供的预测变量是属于肯定类别的概率。 Instead you can pass any kind of score, and the direction of the comparison is detected automatically. 相反，您可以通过任何分数，并且比较的方向会自动检测到。 This is done silently by default but you can see what happens by setting the quiet flag to FALSE : 默认情况下，这是默默地完成的，但是您可以通过将quiet标志设置为FALSE来查看会发生什么：

> pROC::roc(obese, votes_1, quiet = FALSE)
Setting levels: control = 0, case = 1
Setting direction: controls < cases

> pROC::roc(obese, votes_2, quiet = FALSE)
Setting levels: control = 0, case = 1
Setting direction: controls > cases

Notice how in the case of votes_2 it detected that the negative class had higher values (based on the median) and set the direction of the comparison accordingly. 请注意，在votes_2为2的情况下，它如何检测到否定类别的值更高（基于中位数），并相应地设置比较的方向。

If this is not what you want you can always set the class levels and direction parameters explicitly: 如果这不是您想要的，则始终可以显式设置类级别和方向参数：

> pROC::roc(obese, votes_2, levels = c(0, 1), direction = "<")

This will result in a "reversed" curve showing how votes_2 is performing worse than random at detecting the positive class with higher values. 这将导致出现一条“反向”曲线，该曲线显示在检测具有较高值的肯定类别时， votes_2的表现要好于随机。

使用R中的pROC的随机森林拟合对象的ROC曲线，以使用正数或负数“投票”作为预测变量

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-03-21 07:30:37

使用R中的pROC的随机森林拟合对象的ROC曲线，以使用正数或负数“投票”作为预测变量

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-03-21 07:30:37

解决方案1
0 已采纳 2019-03-21 07:30:37