简体   繁体   English

R 中 gl.net model 变量重要性计算的差异

[英]Discripencies in variable importance calculation for glmnet model in R

I want to calculate variable importance for gl.net model in R. I am using gl.net package for fitting the elastic.net model like我想计算 R 中 gl.net model 的变量重要性。我正在使用gl.net package 来拟合 elastic.net model 之类的

library(glmnet)
library(caret)
library(vip)

data_y <- as.vector(mtcars$mpg)
data_x <- as.matrix(mtcars[-1])

fit.glmnet <- glmnet(data_x, data_y, family="gaussian")

set.seed(123)
cvfit.glmnet = cv.glmnet(data_x, data_y, standardize=T)
cvfit.glmnet$lambda.min
coef(cvfit.glmnet, s = "lambda.min")

Then I have used vip package for variable importance as然后我使用vip package 作为可变重要性

#Using vip package
vip::vi_model(cvfit.glmnet, s = cvfit.glmnet$fit$lambda)

which returns me这让我回来

># A tibble: 10 x 3
   Variable Importance Sign 
   <chr>         <dbl> <chr>
 1 cyl         -0.886  NEG  
 2 disp         0      NEG  
 3 hp          -0.0117 NEG  
 4 drat         0      NEG  
 5 wt          -2.71   NEG  
 6 qsec         0      NEG  
 7 vs           0      NEG  
 8 am           0      NEG  
 9 gear         0      NEG  
10 carb         0      NEG 

The variable importance contains both positive and negative values for the variables at the same time it does not vary between 0-1 or 0-100%.变量重要性同时包含变量的正值和负值,同时它不会在 0-1 或 0-100% 之间变化。

Then I have tried customised function from this answer然后我尝试从这个答案中定制 function

#Using function provided in this example
varImp <- function(object, lambda = NULL, ...) {
  
  ## skipping a few lines
  
  beta <- predict(object, s = lambda, type = "coef")
  if(is.list(beta)) {
    out <- do.call("cbind", lapply(beta, function(x) x[,1]))
    out <- as.data.frame(out)
  } else out <- data.frame(Overall = beta[,1])
  out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
  out
}

varImp(cvfit.glmnet, lambda = cvfit.glmnet$lambda.min)

It returns me following output它返回我以下 output

        Overall
cyl  0.88608541
disp 0.00000000
hp   0.01168438
drat 0.00000000
wt   2.70814703
qsec 0.00000000
vs   0.00000000
am   0.00000000
gear 0.00000000
carb 0.00000000

Though the output from customised function does not contain negative values, it does vary within 0-1 or 0-100%.虽然自定义 function 中的 output 不包含负值,但它确实在 0-1 或 0-100% 之间变化。

I know that caret package has varImp function which gives variable importance between 0-100%.我知道caret package 有varImp function,它给出了 0-100% 之间的变量重要性。 But I want to implement the same thing for cv.gl.net object instead of caret::train object. How can I achieve the variable importance alike caret package for cv.gl.net object?但我想为cv.gl.net object 而不是caret::train object 实现同样的事情。我怎样才能实现 cv.gl.net object 的可变重要性,如caret cv.gl.net

The question asks how to obtain gl.net variable importance between 0-100%.问题询问如何获得 0-100% 之间的 gl.net 变量重要性。

If it is desired to assign importance based on coefficient magnitude at a certain (usually optimal) penalty.如果希望根据某个(通常是最优的)惩罚的系数大小来分配重要性。 And if these coefficients are derived based on standardized variables (default in gl.net) then the coefficients can simply be scaled to the 0 - 1 range:如果这些系数是基于标准化变量(gl.net 中的默认值)得出的,那么系数可以简单地缩放到 0 - 1 范围:

The slightly modified function is given:给出稍作修改的 function:

varImp <- function(object, lambda = NULL, ...) {
  beta <- predict(object, s = lambda, type = "coef")
  if(is.list(beta)) {
    out <- do.call("cbind", lapply(beta, function(x) x[,1]))
    out <- as.data.frame(out)
  } else out <- data.frame(Overall = beta[,1])
  out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
  out <- out/max(out)
  out[order(out$Overall, decreasing = TRUE),,drop=FALSE]
}

Using the example in the question:使用问题中的示例:

varImp(cvfit.glmnet, lambda = cvfit.glmnet$lambda.min)
#output
         Overall
wt   1.000000000
cyl  0.320796270
am   0.004840186
hp   0.004605913
disp 0.000000000
drat 0.000000000
qsec 0.000000000
vs   0.000000000
gear 0.000000000
carb 0.000000000

Another approach at assigning variable importance to gl.net models would be scoring the variables based on the penalty for inclusion - Variables are more significant if the are excluded at higher penalties.将变量重要性分配给 gl.net 模型的另一种方法是根据包含的惩罚对变量进行评分 - 如果以更高的惩罚排除变量,则变量更重要。 This approach will be implemented in the mlr3 package: https://github.com/mlr-org/mlr3learners/issues/28 at some point这种方法将在 mlr3 package 中实现: https://github.com/mlr-org/mlr3learners/issues/28在某些时候

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM