[英]Logistic regression: Drop Insignificant prediction Variables
I am using R to perform logistic regression on my data set. 我正在使用R对我的数据集执行逻辑回归。 My data set has more than 50 variables. 我的数据集有50多个变量。
The challenge is to write code in R that can assess the statistical validity of certain records and variables (eg, p values >.05) and eliminate records and variables from the model based on parameters such as that. 面临的挑战是用R语言编写代码,以评估某些记录和变量(例如,p值> .05)的统计有效性,并基于诸如此类的参数从模型中消除记录和变量。
Is there any already implemented method to do this? 有没有已经实现的方法来做到这一点? Any help or suggestion will be appreciated. 任何帮助或建议,将不胜感激。 Thank you. 谢谢。
Here is the implementation of a basic function that will take a set of predictor variables and eliminate those variables step-by-step until a linear model is found that only has predictors below the desired significance level. 这是一个基本功能的实现,该功能将采用一组预测变量,并逐步消除这些变量,直到找到仅具有预测值低于所需显着性水平的线性模型。
reverse.step <- function(y, b, df, alpha=0.05) {
# y = dependent variable name (as character) e.g. 'Height',
# b = vector of explanatory variable names (as characters) e.g.
# c('x1','x2','x3',...), # df = data frame
sum <- summary(lm(paste(paste(y,' ~ ', sep=''),
paste(b, collapse='+'), sep=''), data=df))
cat(b)
cat("\n")
pvals <- sum$coeff[2:nrow(sum$coeff),4]
if (pvals[which.max(pvals)] < alpha) {
return(sum)
}
new.b <- names(pvals[-which.max(pvals)])
if (length(new.b) == 0 | length(new.b) == length(b)) {
return(sum)
} else {
return(reverse.step(y, new.b, df, alpha))
}
}
It may not be the most robust function, but it will get you started. 它可能不是最强大的功能,但可以帮助您入门。
You could also check out the regsubsets method in the library leaps . 你也可以在图书馆里检查出regsubsets方法飞跃 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.