如何为stepAIC添加特定条件

Question

I am running a regression with 37 variables, and I am using stepAIC to perform model selection. 我正在使用37个变量进行回归，并且正在使用stepAIC进行模型选择。 I do NOT want a predictive model. 我不需要预测模型。 I just want to find out what varibles have the best explanatory power. 我只想了解哪些变量具有最佳的解释能力。

My current code looks like: 我当前的代码如下：

fitObject <- lm(mydata)
DEP.select <- stepAIC(fitObject, direction = 'both', scope= list(lower = ~AUC), trace = F, k = log(obs))
# DEP is my dependent variable, and AUC is an independent variable I was want to have in my model.

The problem is that a lot of my variables have high correlation, and the result stepAIC gives me contains several of those highly correlated variables. 问题是我的很多变量都具有很高的相关性，而stepAIC给我的结果包含了其中一些高度相关的变量。 Notice that I have forced AUC in the model, multicollinearity is a problem especially when those variables highly correlated with AUC are chosen in the model. 请注意，我已经在模型中强制使用AUC，多重共线性是一个问题，尤其是在模型中选择了与AUC高度相关的变量时。

Is there a way to specify in the function some thresholds for correlation or p-value of the coefficients? 是否可以在函数中指定一些相关系数或p值的阈值？

Or any comments on other approaches that can solve my problem are welcome. 或者欢迎对可以解决我的问题的其他方法发表评论。

Thank you! 谢谢！

Answer 1

Perhaps Variance Inflation Factor will work better for you. 也许方差通货膨胀系数会更适合您。 This article explains some of the logic. 本文介绍了一些逻辑。 http://en.wikipedia.org/wiki/Variance_inflation_factor http://en.wikipedia.org/wiki/Variance_inflation_factor

Example use: 使用示例：

v=ezvif(df,yvar ='columnNameOfWhichYouAreTryingToPredict')

Here is the function I wrote that combines VIF::vif with cross validation. 这是我编写的将VIF :: vif与交叉验证结合在一起的函数。

require(VIF)
require(cvTools);
#returns selected variables using VIF and kfolds cross validation 
ezvif=function(df,yvar,folds=5,trace=F){
  f=cvFolds(nrow(df),K=folds);
  findings=list();
  for(v in names(df)){
    if(v==yvar)next;
    findings[[v]]=0; 
  }
  for(i in 1:folds){   
    rows=f$subsets[f$which!=i]
    y=df[rows,yvar];
    xdf=df[rows,names(df) != yvar]; #remove output var    
    vifResult=vif(y,xdf,trace=trace,subsize=min(200,floor(nrow(xdf))))
    for(v in names(xdf)[vifResult$select]){
      findings[[v]]=findings[[v]]+1; #vote
    }
  }
  findings=(sort(unlist(findings),decreasing = T))    
  if(trace) print(findings[findings>0]); 
  return( c(yvar,names(findings[findings==findings[1]])) )  
}

Answer 2

I would recommend to remove the variables with high correlations. 我建议删除具有高相关性的变量。 The libraries caret and corrplot can help: 库插入符和更正可以帮助：

library(corrplot)
library(caret)
dm = data.matrix(mydata[,names(mydata) != 'DEP'] #without your outcome var

Visualize your correlations clustering highly correlated together 可视化您的相关性，将高度相关的聚类在一起

corrplot(cor(dm), order = 'hclust')

And find the indices of variables that you could remove due to high (>0.75) correlations 并找到由于高（> 0.75）相关性而可以删除的变量的索引

findCorrelations(cor(dm), 0.75)

Removing these variables can improve your model. 删除这些变量可以改善您的模型。 After removing the variables, continue doing the stepAIC as you described in your question. 删除变量后，按照问题中的说明继续执行stepAIC。

Answer 3

To assess multicollinearity between predictors when running the dredge function (MuMIn package), include the following max.r function as the "extra" argument: 若要在运行挖泥函数（MuMIn程序包）时评估预测变量之间的多重共线性，请包含以下max.r函数作为“额外”参数：

max.r <- function(x){
  corm <- cov2cor(vcov(x))
  corm <- as.matrix(corm)
  if (length(corm)==1){
    corm <- 0
    max(abs(corm))
  } else if (length(corm)==4){
  cormf <- corm[2:nrow(corm),2:ncol(corm)]
  cormf <- 0
  max(abs(cormf))
  } else {
    cormf <- corm[2:nrow(corm),2:ncol(corm)]
    diag(cormf) <- 0
    max(abs(cormf))
  }
}

then simply run dredge specifying the number of predictor variables and including the max.r function: 然后只需运行dredge，指定预测变量的数量并包括max.r函数即可：

options(na.action = na.fail)
Allmodels <- dredge(Fullmodel, rank = "AIC", m.lim=c(0, 3), extra= max.r) 
Allmodels[Allmodels$max.r<=0.6, ] ##Subset models with max.r <=0.6 (not collinear)
NCM <- get.models(Allmodels, subset = max.r<=0.6) ##Retrieve models with max.r <=0.6 (not collinear)
model.sel(NCM) ##Final model selection table

This works for lme4 models. 这适用于lme4模型。 For nlme models see: https://github.com/rojaff/dredge_mc 对于nlme模型，请参见： https : //github.com/rojaff/dredge_mc

如何为stepAIC添加特定条件

问题描述

3 个解决方案

解决方案1
0 2015-05-27 02:59:39

解决方案2
0 2017-06-09 18:49:06

解决方案3
-1 2017-05-22 21:05:10

如何为stepAIC添加特定条件

问题描述

3 个解决方案

解决方案1 0 2015-05-27 02:59:39

解决方案2 0 2017-06-09 18:49:06

解决方案3 -1 2017-05-22 21:05:10

解决方案1
0 2015-05-27 02:59:39

解决方案2
0 2017-06-09 18:49:06

解决方案3
-1 2017-05-22 21:05:10