逻辑回归：丢弃无关紧要的预测变量

Question

I am using R to perform logistic regression on my data set. 我正在使用R对我的数据集执行逻辑回归。 My data set has more than 50 variables. 我的数据集有50多个变量。

The challenge is to write code in R that can assess the statistical validity of certain records and variables (eg, p values >.05) and eliminate records and variables from the model based on parameters such as that. 面临的挑战是用R语言编写代码，以评估某些记录和变量（例如，p值> .05）的统计有效性，并基于诸如此类的参数从模型中消除记录和变量。

Is there any already implemented method to do this? 有没有已经实现的方法来做到这一点？ Any help or suggestion will be appreciated. 任何帮助或建议，将不胜感激。 Thank you. 谢谢。

Answer 1

Here is the implementation of a basic function that will take a set of predictor variables and eliminate those variables step-by-step until a linear model is found that only has predictors below the desired significance level. 这是一个基本功能的实现，该功能将采用一组预测变量，并逐步消除这些变量，直到找到仅具有预测值低于所需显着性水平的线性模型。

reverse.step <- function(y, b, df, alpha=0.05) {
  # y = dependent variable name (as character) e.g. 'Height', 
  # b = vector of explanatory variable names (as characters) e.g. 
  # c('x1','x2','x3',...), # df = data frame
  sum <- summary(lm(paste(paste(y,' ~ ', sep=''), 
                          paste(b, collapse='+'), sep=''), data=df))
  cat(b)
  cat("\n")
  pvals <- sum$coeff[2:nrow(sum$coeff),4]
  if (pvals[which.max(pvals)] < alpha) {
    return(sum)
  }
  new.b <- names(pvals[-which.max(pvals)])
  if (length(new.b) == 0 | length(new.b) == length(b)) {
    return(sum)
  } else {
    return(reverse.step(y, new.b, df, alpha))
  }
}

It may not be the most robust function, but it will get you started. 它可能不是最强大的功能，但可以帮助您入门。

You could also check out the regsubsets method in the library leaps . 你也可以在图书馆里检查出regsubsets方法飞跃。

逻辑回归：丢弃无关紧要的预测变量

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-10-18 20:11:55

逻辑回归：丢弃无关紧要的预测变量

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-10-18 20:11:55

解决方案1
2 已采纳 2014-10-18 20:11:55