简体   繁体   中英

R: Using a pasted formula in Sapply

I am trying to identify correlated explanatory variables and eliminate. I'm using Sapply to apply the regression to the variables I am interested in, and manually deleting the ones with FIVs > 10. However, when I try to reproduce this to quickly run for many vifs, I cannot manage to get my regression script to run with a pasted formula object containing the the names I want to keep. Below:

    regressiondata <- data.frame(matrix(ncol=9,nrow=100,runif(900,1,100)))
colnames(regressiondata) <- c("indep1","indep2","indep3","indep4","var1","var2","var3","var4","var5")
vifs1_model <- sapply(regressiondata[,indep_variables],function(x) vif(lm(x~var1+var2+var3+var4+var5, 
                                                                      data = regressiondata, 
                                                                      na.action=na.exclude)))
vifs1 <- rowMeans(vifs1_model)
formula_variables <- paste(names(vifs1),collapse="+")
final_model <- t(round(sapply(regressiondata[,indep_variables], 
           function(x) lm(x ~ formula_variables,data=regressiondata,na.action=na.exclude)$coef),2))

when I run "final_model" I get this error:

Error in t(round(sapply(regressiondata[, indep_variables], function(x) lm(x ~ : error in evaluating the argument 'x' in selecting a method for function 't': Error in model.frame.default(formula = x ~ formula_variables, data = regressiondata, : variable lengths differ (found for 'formula_variables')

I think you have a couple of issues:

  1. You are using sapply over a dataframe when it looks like you simply want to sapply over the vector of independent variable names
  2. Your last nested call to lm seems to mix expressions and strings

Here is my walk through. Your code refers to some missing objects so I have added in some lines I think you left out

library(car) # for fiv()
regressiondata <- data.frame(matrix(ncol=9,nrow=100,runif(900,1,100)))
colnames(regressiondata) <- c("indep1",
                              "indep2",
                              "indep3",
                              "indep4",
                              "var1",
                              "var2",
                              "var3",
                              "var4",
                              "var5")

indep_variables <- names(regressiondata)[1:4] # object did not exist

I broke out the anonymous functions for clarity:

f1 <- function(x) {
    vif(lm(x~var1+var2+var3+var4+var5,
        data = regressiondata, 
        na.action=na.exclude))
}

Now your regressions

vifs1_model <- sapply(regressiondata[,indep_variables], f1)
vifs1 <- rowMeans(vifs1_model)
formula_variables <- paste(names(vifs1),collapse="+")

I named this function that pulls the coefficients and handed lm a character vector (string) with the whole formula:

getCoefs <- function(x) {
    lm(paste(x, "~", formula_variables), data=regressiondata,
    na.action=na.exclude)$coef
}

Now, just sapply over the vector of names, then transpose and round:

final_model <- sapply(indep_variables, getCoefs)
final_model <- t(round(final_model ,2)) 

Here is a dplyr way of doing things. The bulk of the work is done by the sub_regression function, which conducts a regression, filters independent variables by vif, and then redoes the regression

library(dplyr)
library(tidyr)
library(magrittr)
library(car)

sub_regression = function(sub_data_frame)
  lm(independent_value ~ var1+var2+var3+var4+var5, 
     data = sub_data_frame , 
     na.action="na.exclude") %>%
  vif %>%
  Filter(function(x) x <= 10, .) %>%
  names %>%
  paste(collapse = " + ") %>%
  paste("independent_value ~ ", .) %>%
  as.formula %>%
  lm(. , sub_data_frame, na.action="na.exclude") %>%
  coefficients %>%
  round(3) %>%
  as.list %>%
  data.frame(check.names = FALSE)

matrix(ncol=9,nrow=100,runif(900,1,100)) %>%
  data.frame %>%
  setNames(c("indep1","indep2","indep3","indep4","var1","var2","var3","var4","var5")) %>%
  gather(independent_variable, independent_value, 
         indep1, indep2, indep3, indep4) %>%
  group_by(independent_variable) %>%
  do(sub_regression(.))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM