简体   繁体   中英

Discripencies in variable importance calculation for glmnet model in R

I want to calculate variable importance for gl.net model in R. I am using gl.net package for fitting the elastic.net model like

library(glmnet)
library(caret)
library(vip)

data_y <- as.vector(mtcars$mpg)
data_x <- as.matrix(mtcars[-1])

fit.glmnet <- glmnet(data_x, data_y, family="gaussian")

set.seed(123)
cvfit.glmnet = cv.glmnet(data_x, data_y, standardize=T)
cvfit.glmnet$lambda.min
coef(cvfit.glmnet, s = "lambda.min")

Then I have used vip package for variable importance as

#Using vip package
vip::vi_model(cvfit.glmnet, s = cvfit.glmnet$fit$lambda)

which returns me

># A tibble: 10 x 3
   Variable Importance Sign 
   <chr>         <dbl> <chr>
 1 cyl         -0.886  NEG  
 2 disp         0      NEG  
 3 hp          -0.0117 NEG  
 4 drat         0      NEG  
 5 wt          -2.71   NEG  
 6 qsec         0      NEG  
 7 vs           0      NEG  
 8 am           0      NEG  
 9 gear         0      NEG  
10 carb         0      NEG 

The variable importance contains both positive and negative values for the variables at the same time it does not vary between 0-1 or 0-100%.

Then I have tried customised function from this answer

#Using function provided in this example
varImp <- function(object, lambda = NULL, ...) {
  
  ## skipping a few lines
  
  beta <- predict(object, s = lambda, type = "coef")
  if(is.list(beta)) {
    out <- do.call("cbind", lapply(beta, function(x) x[,1]))
    out <- as.data.frame(out)
  } else out <- data.frame(Overall = beta[,1])
  out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
  out
}

varImp(cvfit.glmnet, lambda = cvfit.glmnet$lambda.min)

It returns me following output

        Overall
cyl  0.88608541
disp 0.00000000
hp   0.01168438
drat 0.00000000
wt   2.70814703
qsec 0.00000000
vs   0.00000000
am   0.00000000
gear 0.00000000
carb 0.00000000

Though the output from customised function does not contain negative values, it does vary within 0-1 or 0-100%.

I know that caret package has varImp function which gives variable importance between 0-100%. But I want to implement the same thing for cv.gl.net object instead of caret::train object. How can I achieve the variable importance alike caret package for cv.gl.net object?

The question asks how to obtain gl.net variable importance between 0-100%.

If it is desired to assign importance based on coefficient magnitude at a certain (usually optimal) penalty. And if these coefficients are derived based on standardized variables (default in gl.net) then the coefficients can simply be scaled to the 0 - 1 range:

The slightly modified function is given:

varImp <- function(object, lambda = NULL, ...) {
  beta <- predict(object, s = lambda, type = "coef")
  if(is.list(beta)) {
    out <- do.call("cbind", lapply(beta, function(x) x[,1]))
    out <- as.data.frame(out)
  } else out <- data.frame(Overall = beta[,1])
  out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
  out <- out/max(out)
  out[order(out$Overall, decreasing = TRUE),,drop=FALSE]
}

Using the example in the question:

varImp(cvfit.glmnet, lambda = cvfit.glmnet$lambda.min)
#output
         Overall
wt   1.000000000
cyl  0.320796270
am   0.004840186
hp   0.004605913
disp 0.000000000
drat 0.000000000
qsec 0.000000000
vs   0.000000000
gear 0.000000000
carb 0.000000000

Another approach at assigning variable importance to gl.net models would be scoring the variables based on the penalty for inclusion - Variables are more significant if the are excluded at higher penalties. This approach will be implemented in the mlr3 package: https://github.com/mlr-org/mlr3learners/issues/28 at some point

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM