從帶系數 (R) 的 RIDGE、LASSO 和凈彈性回歸中為許多變量提取公式

Question

我正在嘗試修改我在這篇文章的一個答案中找到的一些代碼：

AlexB 提供了這些精彩的代碼行：

get_formula <- function(model) {
  broom::tidy(model)[, 1:2] %>%
    mutate(sign = ifelse(sign(estimate) == 1, ' + ', ' - ')) %>% #coeff signs
    mutate_if(is.numeric, ~ abs(round(., 2))) %>% #for improving formatting
    mutate(a = ifelse(term == '(Intercept)', paste0('y ~ ', estimate), paste0(sign, estimate, ' * ', term))) %>%
    summarise(formula = paste(a, collapse = '')) %>%
    as.character
}

雖然這適用於我的一些代碼，但我在調整它以使用 RIDGE、LASSO 和 Net Elastic Regression 從 gl.net 模型打印公式時遇到了問題。

下面附上了我要提供的示例：

library(caret)
library(glmnet)
library(mlbench)
library(psych)
data("BostonHousing")
data <- BostonHousing
set.seed(23)
ind <- sample(2, nrow(data), replace = T, prob = c(0.7, 0.3))
train <- data[ind==1,]
test <- data[ind==2,]
custom <- trainControl(method = "repeatedcv",number = 10,repeats = 5,verboseIter = T)
set.seed(23)
ridge <- train(medv~., train,method = "glmnet",tuneGrid = expand.grid(alpha = 0,lambda = seq(0.0001,1,length = 5)),trControl = custom)
ridge
coef(ridge$finalModel, ridge$bestTune$lambda) # the coefficient estimates

get_formula <- function(model) {
  broom::tidy(model)[, 1:2] %>%
    mutate(sign = ifelse(sign(estimate) == 1, ' + ', ' - ')) %>% #coeff signs
    mutate_if(is.numeric, ~ abs(round(., 2))) %>% #for improving formatting
    mutate(a = ifelse(term == '(Intercept)', paste0('y ~ ', estimate), paste0(sign, estimate, ' * ', term))) %>%
    summarise(formula = paste(a, collapse = '')) %>%
    as.character
}
get_formula(ridge$finalModel)

但是，鑒於它與上一篇文章的格式不同，我在修改 function 時遇到問題，以便它可以打印出我正在尋找的方程式。

給出錯誤：

Error: Problem with `mutate()` input `sign`.
x object 'estimate' not found
i Input `sign` is `ifelse(sign(estimate) == 1, " + ", " - ")`.
Run `rlang::last_error()` to see where the error occurred.

感謝您的幫助。

Answer 1

broom package 有一個gl.net的tidy變體- 您不需要使用[, 1:2]索引整理數據。

只需使用tidy(model)和 pipe 的 rest 就可以正常工作。

下面是function的關鍵部分，拿出來演示一下：


broom::tidy(ridge$finalModel) %>%
  mutate(sign = ifelse(sign(estimate) == 1, ' + ', ' - ')) %>% #coeff signs
  mutate_if(is.numeric, ~ abs(round(., 2))) %>% #for improving formatting
  mutate(a = ifelse(term == '(Intercept)', paste0('y ~ ', estimate), paste0(sign, estimate, ' * ', term))) 

# A tibble: 1,400 x 7
   term         step estimate lambda dev.ratio sign  a        
   <chr>       <dbl>    <dbl>  <dbl>     <dbl> <chr> <chr>    
 1 (Intercept)     1     21.7  6655.      0    " + " y ~ 21.68
 2 (Intercept)     2     21.7  6064.      0.01 " + " y ~ 21.73
 3 (Intercept)     3     21.7  5525.      0.01 " + " y ~ 21.73
 4 (Intercept)     4     21.7  5034.      0.01 " + " y ~ 21.74
 5 (Intercept)     5     21.7  4587.      0.01 " + " y ~ 21.74
 6 (Intercept)     6     21.8  4180.      0.01 " + " y ~ 21.75
 7 (Intercept)     7     21.8  3808.      0.01 " + " y ~ 21.75
 8 (Intercept)     8     21.8  3470.      0.01 " + " y ~ 21.76
 9 (Intercept)     9     21.8  3162.      0.01 " + " y ~ 21.77
10 (Intercept)    10     21.8  2881.      0.02 " + " y ~ 21.78
# … with 1,390 more rows

小提示： across現在可以替換mutate_if ，例如

mutate(across(where(is.numeric), ~abs(round(., 2))))

Answer 2

通過較小的更新，您可以獲得嶺回歸的方程，如下所示：

as.matrix(coef(ridge$finalModel, ridge$bestTune$lambda)) %>%
  as.data.frame() %>%
  tibble::rownames_to_column('term') %>%
  rename(estimate = 2) %>%
  mutate(sign = ifelse(sign(estimate) == 1, ' + ', ' - ')) %>% #coeff signs
  mutate(across(where(is.numeric), ~abs(round(., 2)))) %>% #for improving formatting
  mutate(a = ifelse(term == '(Intercept)', paste0('y ~ ', estimate), paste0(sign, estimate, ' * ', term))) %>%
  summarise(formula = paste(a, collapse = ''))

從帶系數 (R) 的 RIDGE、LASSO 和凈彈性回歸中為許多變量提取公式

問題描述

下面附上了我要提供的示例：

給出錯誤：

2 個解決方案

解決方案1
0 2020-12-03 06:16:57

解決方案2
0 已采納 2020-12-07 09:05:46

從帶系數 (R) 的 RIDGE、LASSO 和凈彈性回歸中為許多變量提取公式

問題描述

下面附上了我要提供的示例：

給出錯誤：

2 個解決方案

解決方案1 0 2020-12-03 06:16:57

解決方案2 0 已采納 2020-12-07 09:05:46

解決方案1
0 2020-12-03 06:16:57

解決方案2
0 已采納 2020-12-07 09:05:46