I'm doing variable selection with the Boruta package in R. Boruta gives me the standard series of boxplots in a single graph, which is useful, but given the fact that I have too many predictors, I am hoping to be able to limit the number of boxplots that appear in the boruta plot. Something like the following image.
Basicacly, I want to "zoom" on the right end of the plot, but have no idea how to do that with the boruta plot object.
Thanks,
MR
Sounds like an simple question, the solution seems surprisingly convoluted. Perhaps somebody can come up with a quicker/more elegant way...
Here, I create a new function based on the source function plot.Boruta
, and add a function argument pars
that takes the names of variables/predictors that we'd like to include in the plot.
As an example, I use the iris
dataset to fit a model.
# Fit model to the iris dataset
library(Boruta);
fit <- Boruta(Species ~ ., data = iris, doTrace = 2);
The function generateCol
is internally called by plot.Boruta
, but is not exported and therefore not available outside of the package. However, we need the function for our revised plot.Boruta
routine.
# generateCol is needed by plot.Boruta
generateCol<-function(x,colCode,col,numShadow){
#Checking arguments
if(is.null(col) & length(colCode)!=4)
stop('colCode should have 4 elements.');
#Generating col
if(is.null(col)){
rep(colCode[4],length(x$finalDecision)+numShadow)->cc;
cc[c(x$finalDecision=='Confirmed',rep(FALSE,numShadow))]<-colCode[1];
cc[c(x$finalDecision=='Tentative',rep(FALSE,numShadow))]<-colCode[2];
cc[c(x$finalDecision=='Rejected',rep(FALSE,numShadow))]<-colCode[3];
col=cc;
}
return(col);
}
We now modify plot.Boruta
, and add a function parameter pars
, by which we filter our list of variables.
# Modified plot.Boruta
plot.Boruta.sel <- function(
x,
pars = NULL,
colCode = c('green','yellow','red','blue'),
sort = TRUE,
whichShadow = c(TRUE, TRUE, TRUE),
col = NULL, xlab = 'Attributes', ylab = 'Importance', ...) {
#Checking arguments
if(class(x)!='Boruta')
stop('This function needs Boruta object as an argument.');
if(is.null(x$ImpHistory))
stop('Importance history was not stored during the Boruta run.');
#Removal of -Infs and conversion to a list
lz <- lapply(1:ncol(x$ImpHistory), function(i)
x$ImpHistory[is.finite(x$ImpHistory[,i]),i]);
colnames(x$ImpHistory)->names(lz);
#Selection of shadow meta-attributes
numShadow <- sum(whichShadow);
lz <- lz[c(rep(TRUE,length(x$finalDecision)), whichShadow)];
#Generating color vector
col <- generateCol(x, colCode, col, numShadow);
#Ordering boxes due to attribute median importance
if (sort) {
ii <- order(sapply(lz, stats::median));
lz <- lz[ii];
col <- col[ii];
}
# Select parameters of interest
if (!is.null(pars)) lz <- lz[names(lz) %in% pars];
#Final plotting
graphics::boxplot(lz, xlab = xlab, ylab = ylab, col = col, ...);
invisible(x);
}
Now all we need to do is call plot.Boruta.sel
instead of plot
, and specify the variables that we'd like to include.
plot.Boruta.sel(fit, pars = c("Sepal.Length", "Sepal.Width"));
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.