简体   繁体   English

逻辑回归:丢弃无关紧要的预测变量

[英]Logistic regression: Drop Insignificant prediction Variables

I am using R to perform logistic regression on my data set. 我正在使用R对我的数据集执行逻辑回归。 My data set has more than 50 variables. 我的数据集有50多个变量。

The challenge is to write code in R that can assess the statistical validity of certain records and variables (eg, p values >.05) and eliminate records and variables from the model based on parameters such as that. 面临的挑战是用R语言编写代码,以评估某些记录和变量(例如,p值> .05)的统计有效性,并基于诸如此类的参数从模型中消除记录和变量。

Is there any already implemented method to do this? 有没有已经实现的方法来做到这一点? Any help or suggestion will be appreciated. 任何帮助或建议,将不胜感激。 Thank you. 谢谢。

Here is the implementation of a basic function that will take a set of predictor variables and eliminate those variables step-by-step until a linear model is found that only has predictors below the desired significance level. 这是一个基本功能的实现,该功能将采用一组预测变量,并逐步消除这些变量,直到找到仅具有预测值低于所需显着性水平的线性模型。

reverse.step <- function(y, b, df, alpha=0.05) {
  # y = dependent variable name (as character) e.g. 'Height', 
  # b = vector of explanatory variable names (as characters) e.g. 
  # c('x1','x2','x3',...), # df = data frame
  sum <- summary(lm(paste(paste(y,' ~ ', sep=''), 
                          paste(b, collapse='+'), sep=''), data=df))
  cat(b)
  cat("\n")
  pvals <- sum$coeff[2:nrow(sum$coeff),4]
  if (pvals[which.max(pvals)] < alpha) {
    return(sum)
  }
  new.b <- names(pvals[-which.max(pvals)])
  if (length(new.b) == 0 | length(new.b) == length(b)) {
    return(sum)
  } else {
    return(reverse.step(y, new.b, df, alpha))
  }
}

It may not be the most robust function, but it will get you started. 它可能不是最强大的功能,但可以帮助您入门。

You could also check out the regsubsets method in the library leaps . 你也可以在图书馆里检查出regsubsets方法飞跃

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM