I feel like this should be the easiest thing in the world. Firstly, I am relatively new to R, but I wanted to learn it. That being said, my experience so far suggests that R is not very intuitive. What I was able to figure out in Python within a couple hours has so far taken 2 days without result in R.
I want to regress a selection of dependent variables within a selection of panel data. I have several variables with various normalization curves. I would like to be able to iterate through many instead of writing regressions 1 at a time.
I want to do something like the following: plm(dependent ~ loopedvar + var2 + var3 + var4, data=mydata, model=c("within"))
I have created a varlist using grep, which is actually very easy. Now I want to substitute in the variables in varlist 1-by-1 as the 'loopedvar.'
In python with SPSS I would do something like
nvariables=len(varlist)
for variable in xrange(nvariables):
testvariable=varlist[variable]
spss.Submit("""AREG dependent WITH
{}
var2
var3
var4
/METHOD PW.
""" .format(testvariable))
I have also found this tutorial http://www.ats.ucla.edu/stat/r/pages/looping_strings.htm , but I cannot get it to work, and I do not understand the *apply functions in R. For one, when writing lapply(varlist, function (x) [model]) how does the varlist[var] know where to go?
I have tried for loops with paste and substitute with varying errors.
for (var in 1:length(varlist)) {
models<-plm(substitute(dependent ~ i, list(i=as.name(paste0(var)), as.name("var2"), as.name("var3"), as.name("var4")) data=mydata, model=c("within")))
}
Throws "Error: unexpected symbol in: [...(var4"")) data]"
for (var in 1:length(varlist)) {
+ models<-summary(plm(paste0("dependent ~ ",var," + var2 + var3 + var4"), data=mydata, model=c("within")))
+ }
Throws "Error: inherits(object, "formula") is not TRUE"
These errors are super unhelpful, and I'm just sick of guessing. R syntax is not very straightforward in my estimation, and the chances that I will get it right are slim.
Please don't post a non-response. R people have a penchant for that in my experience. If I have insufficiently described my issue or desires just request more information, and I will be happy to oblige.
EDIT: I forgot the index argument in plm function. It should be there.
Definitely one of the harder things to wrap one's head around in R is that it does not like the "macro" approach used in some other languages (I learned to code Stata before branching out into R). Almost always there is a way to use the *apply functions instead of a loop-with-macro-reference to do what you want to do.
Here is how I would approach your particular problem.
data <- data.frame(dep = runif(100), var1=runif(100), var2=runif(100),var3=runif(100)) #Create some fake data
varlist<-c("var1","var2","var3") # Declare your varlist as a vector
lm.results<- lapply(data[,varlist],function(x) lm(dep ~ x, data=data)) # run the regression on each variable.
Let me break that last line down a little bit. A dataframes in R is actually a list with extra structure, where each item in the list is a variable/column. So lapply(data[,varlist],FUN)
will evaluate the function FUN
, using each column in data[,varlist]
ie each variable in data
which is named in varlist
.
Since there isn't a built in function for what you need (there often isn't) you declare it on the fly. function(x) lm(dep ~ x, data=data)
takes a variable as an argument (in the lapply
call, each variable in varlist
) and regresses dep
on that variable. The results will be stored in a new list called lm.results
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.