简体   繁体   中英

Standard Error of Ridge Logistic Regression Coefficient using caret

I am using caret package in R, to perform Ridge Logistic Regression. Now I am able to find the coefficients for each variable.

Question is: How to know the standard error of coefficient for each variable produce using Ridge logistic regression?

Here is the sample code that I have:-

Ridge1 <- train(Group ~., data = train, method = 'glmnet',
               trControl = trainControl("cv", number = 10),
               tuneGrid = expand.grid(alpha = 0, 
                                      lambda = lambda),
               family="binomial")

Coefficient of Ridge logistic regression

coef(Ridge1$finalModel, Ridge1$bestTune$lambda)

How to get a result as in logistic regression model (ie: the standard error, wald statistic, p-value.. etc?)

You don't get p-values and confidence intervals from ridge or gl.net regressions because it is very difficult to estimate the distribution of the estimator when a penalization term is present. The first part of the publication for R package hmi touches on this and you can check out post such as this and this

We can try something below, for example getting the optimal lambda from caret and using that in another package hmi to estimate confidence intervals and p-values, but I would interpret these with caution, they are very different from a custom logistic glm.

library(caret)
library(mlbench)
data(PimaIndiansDiabetes)

X = as.matrix(PimaIndiansDiabetes[,-ncol(PimaIndiansDiabetes)])
y = as.numeric(PimaIndiansDiabetes$diabetes)-1

lambda = 10^seq(-5,4,length.out=25)

Ridge1 <- train(x=X,y=factor(y), method = 'glmnet',family="binomial",
               trControl = trainControl("cv", number = 10),
               tuneGrid = expand.grid(alpha = 0, 
                                      lambda = lambda))

bestLambda = Ridge1$bestTune$lambda

Use hdi, but note that the coefficients will not be exactly the same as what you get with caret, or gl.net:

library(hdi)
fit = ridge.proj(X,y,family="binomial",lambda=bestLambda)

cbind(fit$bhat,fit$se,fit$pval)

                  [,1]         [,2]         [,3]
pregnant  0.1137868935 0.0314432291 2.959673e-04
glucose   0.0329008177 0.0035806920 3.987411e-20
pressure -0.0122503030 0.0051224313 1.677961e-02
triceps   0.0009404808 0.0067935741 8.898952e-01
insulin  -0.0012293122 0.0008902878 1.673395e-01
mass      0.0787408742 0.0145166392 5.822097e-08
pedigree  0.9120151630 0.2927090989 1.834633e-03
age       0.0116844697 0.0092017927 2.041546e-01

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM