简体   繁体   English

r中的ROC()在R的插入包中

[英]ROC in rfe() in caret package for R

I am using the caret package in R for training a radial basis SVM for classification; 我使用R中的插入包来训练径向基SVM进行分类; in addition, a linear SVM is used for variable selection. 此外,线性SVM用于变量选择。 With metric="Accuracy", this works fine, but eventually I am more interested in optimizing metric="ROC". 使用metric =“Accuracy”,这很好,但最终我更感兴趣的是优化metric =“ROC”。 While the ROC is calculated for all models that are fit, there seems to be some problem with aggregating the ROC values. 虽然计算所有适合的模型的ROC,但是聚合ROC值似乎存在一些问题。

The following is some example code: 以下是一些示例代码:

library(caret)
library(mlbench)

set.seed(0)

data(Sonar)
x<-scale(Sonar[,1:60])
y<-as.factor(Sonar[,61])

# Custom summary function to use both
# defaultSummary() and twoClassSummary
# Also input and output of summary function are printed

svm.summary<-function(data, lev = NULL, model = NULL){
 print(head(data,n=3))
 a<-defaultSummary(data, lev, model)
 b<-twoClassSummary(data, lev, model)
 out<-c(a,b)
 print(out)
 out}

fitControl <- trainControl(
 method = "cv",
 number = 2,
 classProbs = TRUE,
 summaryFunction=svm.summary,
 verbose=T,
 allowParallel = FALSE)

# Ranking function: Rank Variables using a linear 
# SVM 

rankSVM<-function(object,x,y) {
 print("ranking")
 obj<-ksvm(x=as.matrix(x), y=y, 
  kernel=vanilladot,
  kpar=list(), C=10,
  scaled=F)
 w<-t(obj@coef[[1]]%*%obj@xmatrix[[1]])
 z<-abs(w)/sqrt(sum(w^2))
 ord<-order(z,decreasing=T)
 data.frame(var=dimnames(z)[[1]][ord],Overall=z[ord])
}


svmFuncs<-getModelInfo("svmRadial",regex=F)

svmFit<-function(x,y,first,last,...) {
 out<-train(x=x,y=as.factor(y),    
  method="svmRadial",
  trControl=fitControl,
  scaled=F,
  metric="Accuracy",
  maximize=T,
  returnData=T)
  out$finalModel}

selectionFunctions<-list(summary=svm.summary,
 fit=svmFit,
 pred=svmFuncs$svmRadial$predict,
 prob=svmFuncs$svmRadial$prob,
 rank=rankSVM,
 selectSize=pickSizeBest,
 selectVar=pickVars)                         

selectionControl<-rfeControl(functions=selectionFunctions,
 rerank=F,
 verbose=T,
 method="cv",
 number=2)

subsets<-c(1,30,60)

svmProfile<-rfe(x=x,y=y,
 sizes=subsets,
 metric="Accuracy",
 maximize=TRUE,
 rfeControl=selectionControl)

svmProfile

The final output is the following: 最终输出如下:

> svmProfile

Recursive feature selection

Outer resampling method: Cross-Validated (2 fold) 

Resampling performance over subset size:

Variables Accuracy  Kappa ROC   Sens   Spec AccuracySD KappaSD ROCSD  SensSD SpecSD Selected
        1   0.8075 0.6122 NaN 0.8292 0.7825    0.02981 0.06505    NA 0.06153 0.1344        *
       30   0.8028 0.6033 NaN 0.8205 0.7825    0.00948 0.02533    NA 0.09964 0.1344         
       60   0.8028 0.6032 NaN 0.8206 0.7823    0.00948 0.02679    NA 0.12512 0.1635         

The top 1 variables (out of 1):
V49

ROC is NaN. ROC是NaN。 Inspecting the output (as verbose=T and the summary function was patched to display both its output and parts of its input) reveals that while when tuning the SVMs in the inner loop, ROC seems to be calculated correctly: 检查输出(作为详细= T和摘要函数被修补以显示其输出和输入的部分)表明,当调整内循环中的SVM时,ROC似乎正确计算:

+ Fold1: sigma=0.01172, C=0.25 
  pred obs         M         R
1    M   R 0.6658878 0.3341122
2    M   R 0.5679477 0.4320523
3    R   R 0.2263576 0.7736424
 Accuracy     Kappa       ROC      Sens      Spec 
0.6730769 0.3480826 0.7961310 0.6428571 0.7083333 
- Fold1: sigma=0.01172, C=0.25 
+ Fold1: sigma=0.01172, C=0.50 
  pred obs         M         R
1    M   R 0.7841249 0.2158751
2    M   R 0.7231365 0.2768635
3    R   R 0.3033492 0.6966508
 Accuracy     Kappa       ROC      Sens      Spec 
0.7692308 0.5214724 0.8407738 0.9642857 0.5416667 
- Fold1: sigma=0.01172, C=0.50 

[...]

there seems to be a problem in the outer iteration. 在外部迭代中似乎存在问题。 "Between" two folds we get the following: “两次之间”我们得到以下内容:

-(rfe) fit Fold1 size:  1 
  pred obs Variables
1    M   R         1
2    M   R         1
3    M   R         1
 Accuracy     Kappa       ROC      Sens      Spec 
0.7864078 0.5662328        NA 0.8727273 0.6875000 
  pred obs Variables
1    R   R        30
2    M   R        30
3    M   R        30
 Accuracy     Kappa       ROC      Sens      Spec 
0.7961165 0.5853939        NA 0.8909091 0.6875000 
  pred obs Variables
1    R   R        60
2    M   R        60
3    M   R        60
 Accuracy     Kappa       ROC      Sens      Spec 
0.7961165 0.5842783        NA 0.9090909 0.6666667 
+(rfe) fit Fold2 size: 60 

So here it seems the input for the summary function is a matrix that does not contain the class probabilities but the number of variables instead, and so the ROCs cannot be calculated / aggregated correctly. 所以这里似乎摘要函数的输入是一个矩阵,它不包含类概率,而是包含变量的数量,因此无法正确计算/聚合ROC。 Does anybody know how to prevent this? 有人知道如何防止这种情况吗? Did I forget to tell caret to output class probabilities in some place? 我忘了告诉插入符号在某个地方输出类概率吗?

Help is greatly appreciated, as caret is really a cool package to use and would save me plenty of work if I can get this to run correctly. 非常感谢帮助,因为Caret真的是一个很酷的包使用,如果我可以正常运行,将节省我很多工作。

Thoralf 托拉尔夫

getModelInfo is designed to get code for train and doesn't automatically work with rfe (I'll make a note of that in the documentation). getModelInfo旨在获取train代码,并且不会自动使用rfe (我将在文档中记下这一点)。 rfe doesn't look for a slot called probs and no probability predictions means not ROC summary. rfe不寻找名为probs的插槽,也没有概率预测意味着没有ROC摘要。

You might want base your code on caretFuncs , which is designed to work with rfe and should automate a lot of what I think you would like to do. 你可能希望将你的代码基于caretFuncs ,它设计用于rfe并且应该自动执行我认为你想做的很多事情。

For example, in caretFuncs , the pred module will create class and probability predictions: 例如,在caretFuncspred模块将创建类和概率预测:

function(object, x) {
  tmp <- predict(object, x)
  if(object$modelType == "Classification" &
     !is.null(object$modelInfo$prob)) {
         out <- cbind(data.frame(pred = tmp),
                      as.data.frame(predict(object, x, type = "prob")))
         } else out <- tmp
      out
  }

You might be able to simply plug in your rankSVM into caretFuncs$rank . 您可以简单地将rankSVM插入caretFuncs$rank

Take a look at the feature selection page on the website . 请查看网站上的功能选择页面 It has details about what code modules you will need. 它包含您需要的代码模块的详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM