I have a data frame (df) of 72 observations and 592 variable with one factor class variable (total of 593 variables ie dim(df) = 72 593). I am looking for a way to select 7 variables (including the class variable) using Receiver Operating Characteristics (ROC) for selection of the optimum k value. I want to use these seven variables for analysis using graphical models but I don't want to select the variables at random. I want my selection to be statistically justified.
What I would like to see as my result is something like:
Variables V23, V120, V230, V333, V496, V585, V593 were selected based on the highest value of ROC.
Ie I want to perform classification and selection of the "best" predicted variables of high accuracy so that I can used these variables for graphical modelling.
I have tried using the caret package but I don't know how to manipulate it to select variables (columns) of high accuracy which can be used for other analysis.
Thanks guys. Am sure someone understood me.
Thanks.
kutex.
I would do something like this:
library(pROC)
#' Select the N top variables with ROC analysis
#' @param response the class variable name
#' @param predictors the variables names from which to select
#' @param data must contain the predictors as columns
#' @param n the number of
select.top.N.ROC <- function(response, predictors, data, n) {
n <- min(n, length(predictors))
aucs <- sapply(predictors, function(predictor) {
auc(data[[response]], data[[predictor]])
})
return(predictors[order(aucs, decreasing=TRUE)][1:n])
}
top.variables <- select.top.N.ROC("class", paste("V", 1:593, sep=""), myDataFrame, 7)
cat(paste("Variables", paste(top.variables, collapse=", "), "were selected based on the highest value of ROC. "))
As with any univariate feature selection method, you may select 7 fully correlated variables that won't give you any additional information, so selecting V23 would have been sufficient. For multivariate datasets, you should consider using a multivariate feature selection method instead.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.