简体   繁体   中英

Boruta box plots in R

I'm doing variable selection with the Boruta package in R. Boruta gives me the standard series of boxplots in a single graph, which is useful, but given the fact that I have too many predictors, I am hoping to be able to limit the number of boxplots that appear in the boruta plot. Something like the following image.

博鲁塔图

Basicacly, I want to "zoom" on the right end of the plot, but have no idea how to do that with the boruta plot object.

Thanks,

MR

Sounds like an simple question, the solution seems surprisingly convoluted. Perhaps somebody can come up with a quicker/more elegant way...

Here, I create a new function based on the source function plot.Boruta , and add a function argument pars that takes the names of variables/predictors that we'd like to include in the plot.

As an example, I use the iris dataset to fit a model.

# Fit model to the iris dataset
library(Boruta);
fit <- Boruta(Species ~ ., data = iris, doTrace = 2);

The function generateCol is internally called by plot.Boruta , but is not exported and therefore not available outside of the package. However, we need the function for our revised plot.Boruta routine.

# generateCol is needed by plot.Boruta
generateCol<-function(x,colCode,col,numShadow){
 #Checking arguments
 if(is.null(col) & length(colCode)!=4)
  stop('colCode should have 4 elements.');
 #Generating col
 if(is.null(col)){
  rep(colCode[4],length(x$finalDecision)+numShadow)->cc;
  cc[c(x$finalDecision=='Confirmed',rep(FALSE,numShadow))]<-colCode[1];
  cc[c(x$finalDecision=='Tentative',rep(FALSE,numShadow))]<-colCode[2];
  cc[c(x$finalDecision=='Rejected',rep(FALSE,numShadow))]<-colCode[3];
  col=cc;
 }
 return(col);
}

We now modify plot.Boruta , and add a function parameter pars , by which we filter our list of variables.

# Modified plot.Boruta
plot.Boruta.sel <- function(
    x,
    pars = NULL,
    colCode = c('green','yellow','red','blue'),
    sort = TRUE,
    whichShadow = c(TRUE, TRUE, TRUE),
    col = NULL, xlab = 'Attributes', ylab = 'Importance', ...) {

    #Checking arguments
    if(class(x)!='Boruta')
        stop('This function needs Boruta object as an argument.');
    if(is.null(x$ImpHistory))
        stop('Importance history was not stored during the Boruta run.');

    #Removal of -Infs and conversion to a list
    lz <- lapply(1:ncol(x$ImpHistory), function(i)
        x$ImpHistory[is.finite(x$ImpHistory[,i]),i]);
    colnames(x$ImpHistory)->names(lz);

    #Selection of shadow meta-attributes
    numShadow <- sum(whichShadow);
    lz <- lz[c(rep(TRUE,length(x$finalDecision)), whichShadow)];

    #Generating color vector
    col <- generateCol(x, colCode, col, numShadow);

    #Ordering boxes due to attribute median importance
    if (sort) {
        ii <- order(sapply(lz, stats::median));
        lz <- lz[ii];
        col <- col[ii];
    }

    # Select parameters of interest
    if (!is.null(pars)) lz <- lz[names(lz) %in% pars];

    #Final plotting
    graphics::boxplot(lz, xlab = xlab, ylab = ylab, col = col, ...);
    invisible(x);
}

Now all we need to do is call plot.Boruta.sel instead of plot , and specify the variables that we'd like to include.

plot.Boruta.sel(fit, pars = c("Sepal.Length", "Sepal.Width"));

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM