简体   繁体   中英

R Looping through independent variables in panel model

I feel like this should be the easiest thing in the world. Firstly, I am relatively new to R, but I wanted to learn it. That being said, my experience so far suggests that R is not very intuitive. What I was able to figure out in Python within a couple hours has so far taken 2 days without result in R.

I want to regress a selection of dependent variables within a selection of panel data. I have several variables with various normalization curves. I would like to be able to iterate through many instead of writing regressions 1 at a time.

I want to do something like the following: plm(dependent ~ loopedvar + var2 + var3 + var4, data=mydata, model=c("within"))

I have created a varlist using grep, which is actually very easy. Now I want to substitute in the variables in varlist 1-by-1 as the 'loopedvar.'

In python with SPSS I would do something like

nvariables=len(varlist)
for variable in xrange(nvariables):
 testvariable=varlist[variable]
 spss.Submit("""AREG dependent WITH 
{}
var2
var3
var4
 /METHOD PW.
""" .format(testvariable))

I have also found this tutorial http://www.ats.ucla.edu/stat/r/pages/looping_strings.htm , but I cannot get it to work, and I do not understand the *apply functions in R. For one, when writing lapply(varlist, function (x) [model]) how does the varlist[var] know where to go?

I have tried for loops with paste and substitute with varying errors.

for (var in 1:length(varlist)) {
     models<-plm(substitute(dependent ~ i, list(i=as.name(paste0(var)), as.name("var2"), as.name("var3"), as.name("var4")) data=mydata, model=c("within")))
}

Throws "Error: unexpected symbol in: [...(var4"")) data]"

for (var in 1:length(varlist)) {
+     models<-summary(plm(paste0("dependent ~ ",var," + var2 + var3 + var4"), data=mydata, model=c("within")))
+ }

Throws "Error: inherits(object, "formula") is not TRUE"

These errors are super unhelpful, and I'm just sick of guessing. R syntax is not very straightforward in my estimation, and the chances that I will get it right are slim.

Please don't post a non-response. R people have a penchant for that in my experience. If I have insufficiently described my issue or desires just request more information, and I will be happy to oblige.

EDIT: I forgot the index argument in plm function. It should be there.

Definitely one of the harder things to wrap one's head around in R is that it does not like the "macro" approach used in some other languages (I learned to code Stata before branching out into R). Almost always there is a way to use the *apply functions instead of a loop-with-macro-reference to do what you want to do.

Here is how I would approach your particular problem.

data <- data.frame(dep = runif(100), var1=runif(100), var2=runif(100),var3=runif(100)) #Create some fake data

varlist<-c("var1","var2","var3") # Declare your varlist as a vector

lm.results<- lapply(data[,varlist],function(x) lm(dep ~ x, data=data)) # run the regression on each variable.

Let me break that last line down a little bit. A dataframes in R is actually a list with extra structure, where each item in the list is a variable/column. So lapply(data[,varlist],FUN) will evaluate the function FUN , using each column in data[,varlist] ie each variable in data which is named in varlist .

Since there isn't a built in function for what you need (there often isn't) you declare it on the fly. function(x) lm(dep ~ x, data=data) takes a variable as an argument (in the lapply call, each variable in varlist ) and regresses dep on that variable. The results will be stored in a new list called lm.results .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM