从 glmnet 中提取系数变量名称到 data.frame

Question

I would like to extract the glmnet generated model coefficients and create a SQL query from them.我想提取 glmnet 生成的模型系数并从中创建 SQL 查询。 The function coef(cv.glmnet.fit) yields a ' dgCMatrix ' object.函数coef(cv.glmnet.fit)产生一个“ dgCMatrix ”对象。 When I convert it to a matrix using as.matrix , the variable names are lost and only the coefficient values are left behind.当我使用as.matrix将其转换为矩阵时，变量名称丢失，只留下系数值。

I know one can print the coefficients in the screen, however is it possible to write the names to a data frame?我知道可以在屏幕上打印系数，但是是否可以将名称写入数据框？

Can anybody assist to extract these names?有人可以协助提取这些名称吗？

Answer 1

UPDATE: Both first two comments of my answer are right. 更新：我的答案的前两个评论都是正确的。 I have kept the answer below the line just for posterity. 我为后代保留了答案。

The following answer is short, it works and does not need any other package: 以下答案很简短，有效，不需要任何其他包：

tmp_coeffs <- coef(cv.glmnet.fit, s = "lambda.min")
data.frame(name = tmp_coeffs@Dimnames[[1]][tmp_coeffs@i + 1], coefficient = tmp_coeffs@x)

The reason for +1 is that the @i method indexes from 0 for the intercept but @Dimnames[[1]] starts at 1. +1的原因是@i方法从0开始截取，但@Dimnames[[1]]从1开始。

OLD ANSWER: (only kept for posterity) ~~Try these lines:~~ 老答案:(仅为子孙后代保留） ~~尝试这些方法：~~

~~The non zero coefficients:~~ ~~非零系数：~~

 
 
 
  
  coef(cv.glmnet.fit, s = "lambda.min")[which(coef(cv.glmnet.fit, s = "lambda.min") != 0)]

~~The features that are selected:~~ ~~选择的功能：~~

 
 
 
  
  colnames(regression_data)[which(coef(cv.glmnet.fit, s = "lambda.min") != 0)]

~~Then putting them together as a dataframe is staight forward, but let me know if you want that part of the code also.~~ ~~然后将它们作为数据帧放在一起是明确的前进，但是如果你想要那部分代码，请告诉我。~~

Answer 2

这些名称应该可以作为dimnames(coef(cv.glmnet.fit))[[1]] ，因此以下内容应该将系数名称和值都放入data.frame： data.frame(coef.name = dimnames(coef(GLMNET))[[1]], coef.value = matrix(coef(GLMNET)))

Answer 3

Check broom package. 检查扫帚包装。 It has tidy function that converts output of different R objects (including glmnet ) into data.frames. 它具有tidy功能，可将不同R对象（包括glmnet ）的输出转换为data.frames。

Answer 4

Building on Mehrad's solution above, here is a simple function to print a table containing only the non-zero coefficients: 在上面的Mehrad解决方案的基础上，这是一个打印仅包含非零系数的表的简单函数：

print_glmnet_coefs <- function(cvfit, s="lambda.min") {
    ind <- which(coef(cvfit, s=s) != 0)
    df <- data.frame(
        feature=rownames(coef(cvfit, s=s))[ind],
        coeficient=coef(cvfit, s=s)[ind]
    )
    kable(df)
}

The function above uses the kable() function from knitr to produce a Markdown-ready table. 上面的函数使用knitr的kable()函数生成Markdown-ready表。

Answer 5

There is an approach with using coef() to glmnet() object (your model). 有一种方法使用coef（）到glmnet（）对象（你的模型）。 In a case below index [[1]] indicate the number of outcome class in multinomial logistic regression, maybe for other models you shoould remove it. 在下面的情况下，索引[[1]]表示多项逻辑回归中的结果类数，也许对于其他模型，您可以将其删除。

coef_names_GLMnet <- coef(GLMnet, s = 0)[[1]]
row.names(coef_names_GLMnet)[coef_names_GLMnet@i+1]

row.names() indexes in such case needs incrementing (+1) because numeration of variables (data features) in coef() object begining from 0, but after transformation character vector numeration begining from 1. 在这种情况下， row.names（）索引需要递增（+1），因为coef（）对象中的变量（数据特征）的编号从0开始，但是在转换后字符向量计数从1开始。

Answer 6

# requires tibble.
tidy_coef <- function(x){
    coef(x) %>%
    matrix %>%   # Coerce from sparse matrix to regular matrix.
    data.frame %>%  # Then dataframes.
    rownames_to_column %>%  # Add rownames as explicit variables.
    setNames(c("term","estimate"))
}

Without tibble: 没有tibble：

tidy_coef2 <- function(x){
    x <- coef(x)
    data.frame(term=rownames(x),
               estimate=matrix(x)[,1],
               stringsAsFactors = FALSE)
}

Answer 7

Here, I wrote a reproducible example and fitted a binary (logistic) example using cv.glmnet . 在这里，我编写了一个可重现的示例，并使用cv.glmnet拟合了二进制（逻辑）示例。 A glmnet model fit will also work. glmnet模型适合也适用。 At the end of this example, I assembled non-zero coefficients, and associated features, into a data.frame called myResults : 在这个例子的最后，我将非零系数和相关特征组合到一个名为myResults ：

library(glmnet)
X <- matrix(rnorm(100*10), 100, 10);
X[51:100, ] <- X[51:100, ] + 0.5; #artificially introduce difference in control cases
rownames(X) <- paste0("observation", 1:nrow(X));
colnames(X) <- paste0("feature",     1:ncol(X));

y <- factor( c(rep(1,50), rep(0,50)) ); #binary outcome class label
y
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Levels: 0 1

## Perform logistic model fit:
fit1 <- cv.glmnet(X, y, family="binomial", nfolds=5, type.measure="auc"); #with K-fold cross validation
# fit1 <- glmnet(X, y, family="binomial") #without cross validation also works

## Adapted from @Mehrad Mahmoudian:
myCoefs <- coef(fit1, s="lambda.min");
myCoefs[which(myCoefs != 0 ) ]               #coefficients: intercept included
## [1]  1.4945869 -0.6907010 -0.7578129 -1.1451275 -0.7494350 -0.3418030 -0.8012926 -0.6597648 -0.5555719
## [10] -1.1269725 -0.4375461
myCoefs@Dimnames[[1]][which(myCoefs != 0 ) ] #feature names: intercept included
## [1] "(Intercept)" "feature1"    "feature2"    "feature3"    "feature4"    "feature5"    "feature6"   
## [8] "feature7"    "feature8"    "feature9"    "feature10"  

## Asseble into a data.frame
myResults <- data.frame(
  features = myCoefs@Dimnames[[1]][ which(myCoefs != 0 ) ], #intercept included
  coefs    = myCoefs              [ which(myCoefs != 0 ) ]  #intercept included
)
myResults
##       features      coefs
## 1  (Intercept)  1.4945869
## 2     feature1 -0.6907010
## 3     feature2 -0.7578129
## 4     feature3 -1.1451275
## 5     feature4 -0.7494350
## 6     feature5 -0.3418030
## 7     feature6 -0.8012926
## 8     feature7 -0.6597648
## 9     feature8 -0.5555719
## 10    feature9 -1.1269725
## 11   feature10 -0.4375461

Answer 8

Assuming you know how to obtain your lambda, I found two different ways to show the predictors needed in the selected model for that particular lambda. 假设您知道如何获得lambda，我发现了两种不同的方法来显示特定lambda所选模型中所需的预测变量。 One of them includes the intercept. 其中一个包括拦截。 The lambda can be obtained using cross-validation by the mean of cv.glmnet from " glmnet " library. lambda可以通过来自“ glmnet ”库的cv.glmnet的平均值进行交叉验证来获得。 You might want to only look at the last lines for each method: 您可能只想查看每个方法的最后几行：

 myFittedLasso = glmnet(x=myXmatrix, y=myYresponse, family="binomial")
 myCrossValidated = cv.glmnet(x=myXmatrix, y=myYresponse, family="binomial")
 myLambda = myCrossValidated$lambda.1se  # can be simply lambda

 # Method 1 without the intercept
 myBetas = myFittedLasso$beta[, which(myFittedLasso$lambda == myLambda)]
 myBetas[myBetas != 0]
 ## myPredictor1    myPredictor2    myPredictor3
 ##   0.24289802      0.07561533      0.18299284


 # Method 2 with the intercept
 myCoefficients = coef(myFittedLasso, s=myLambda)
 dimnames(myCoefficients)[[1]][which(myCoefficients != 0)]
 ## [1] "(Intercept)"    "myPredictor1"    "M_myPredictor2"    "myPredictor3"

 myCoefficients[which(myCoefficients != 0)]
 ## [1] -4.07805560  0.24289802  0.07561533  0.18299284

Note that the example above implies a binomial distribution but the steps can be applied to any other kind. 请注意，上面的示例意味着二项分布，但步骤可以应用于任何其他类型。

Answer 9

I faced a similar issue when using glmnet from the tidymodels framework, where the model was trained within a workflow and neither coef() nor the above solutions worked.在使用glmnet框架中的tidymodels时，我遇到了类似的问题，其中模型是在工作流程中训练的，并且coef()和上述解决方案都tidymodels 。

What worked for me though, was part of the glmnet:::coef.glmnet code:不过，对我glmnet:::coef.glmnet的是glmnet:::coef.glmnet代码的一部分：

# taken from glmnet:::coef.glmnet
coefs <- predict(x, "lambda.min", type = "coefficients", exact = FALSE)

dd <- cbind(
  data.frame(var = rownames(coefs)),
  as.data.table(as.matrix(coefs))
)

从 glmnet 中提取系数变量名称到 data.frame

问题描述

9 个解决方案

解决方案1
23 2015-01-15 15:11:05

解决方案2
6 2015-02-19 21:10:50

解决方案3
4 2015-01-06 14:45:52

解决方案4
4 2015-12-27 13:18:26

解决方案5
2 2016-08-25 16:53:43

解决方案6
2 2016-09-30 21:54:36

解决方案7
2 2017-06-23 18:50:12

解决方案8
1 2017-08-17 01:46:30

解决方案9
0 2021-11-16 09:45:18

从 glmnet 中提取系数变量名称到 data.frame

问题描述

9 个解决方案

解决方案1 23 2015-01-15 15:11:05

解决方案2 6 2015-02-19 21:10:50

解决方案3 4 2015-01-06 14:45:52

解决方案4 4 2015-12-27 13:18:26

解决方案5 2 2016-08-25 16:53:43

解决方案6 2 2016-09-30 21:54:36

解决方案7 2 2017-06-23 18:50:12

解决方案8 1 2017-08-17 01:46:30

解决方案9 0 2021-11-16 09:45:18

解决方案1
23 2015-01-15 15:11:05

解决方案2
6 2015-02-19 21:10:50

解决方案3
4 2015-01-06 14:45:52

解决方案4
4 2015-12-27 13:18:26

解决方案5
2 2016-08-25 16:53:43

解决方案6
2 2016-09-30 21:54:36

解决方案7
2 2017-06-23 18:50:12

解决方案8
1 2017-08-17 01:46:30

解决方案9
0 2021-11-16 09:45:18