[英]R extract regression coefficients from multiply regression via lapply command
I have a large dataset with several variables, one of which is a state variable, coded 1-50 for each state. 我有一个包含多个变量的大型数据集,其中一个是状态变量,每个状态编码为1-50。 I'd like to run a regression of 28 variables on the remaining 27 variables of the dataset (there are 55 variables total), and specific for each state.
我想对数据集的其余27个变量(总共有55个变量)进行28个变量的回归,并针对每个状态进行回归。
In other words, run a regression of variable1 on covariate1, covariate2, ..., covariate27 for observations where state==1. 换句话说,对state == 1的观测值在covariate1,covariate2,...,covariate27上进行变量1的回归。 I'd then like to repeat this for variable1 for states 2-50, and the repeat the whole process for variable2, variable3,..., variable28.
然后,我想针对状态2-50的变量1重复此操作,并针对变量2,变量3,...,变量28重复整个过程。
I think I've written the correct R code to do this, but the next thing I'd like to do is extract the coefficients, ideally into a coefficient matrix. 我想我已经编写了正确的R代码来执行此操作,但是接下来我想做的是提取系数,理想情况下是将其提取到系数矩阵中。 Could someone please help me with this?
有人可以帮我吗? Here's the code I've written so far:
这是我到目前为止编写的代码:
for (num in 1:50) {
#PUF is the data set I'm using
#Subset the data by states
PUFnum <- subset(PUF, state==num)
#Attach data set with state specific data
attach(PUFnum)
#Run our prediction regression
#the variables class1 through e19700 are the 27 covariates I want to use
regression <- lapply(PUFnum, function(z) lm(z ~ class1+class2+class3+class4+class5+class6+class7+
xtot+e00200+e00300+e00600+e00900+e01000+p04470+e04800+
e09600+e07180+e07220+e07260+e06500+e10300+
e59720+e11900+e18425+e18450+e18500+e19700))
Beta <- lapply(regression, function(d) d<- coef(regression$d))
detach(PUFnum)
}
This is another example of the classic Split-Apply-Combine
problem, which can be addressed using the plyr
package by @hadley. 这是经典的“
Split-Apply-Combine
问题的另一个示例,可以使用plyr
软件包来解决。 In your problem, you want to 在您的问题中,您想
I will illustrate it with the Cars93
dataset available in MASS
library. 我将通过
MASS
库中可用的Cars93
数据集进行说明。 We are interested in figuring out the relationship between horsepower
and enginesize
based on origin
of country. 我们感兴趣的是找出关系
horsepower
和enginesize
根据origin
国。
# LOAD LIBRARIES
require(MASS); require(plyr)
# SPLIT-APPLY-COMBINE
regressions <- dlply(Cars93, .(Origin), lm, formula = Horsepower ~ EngineSize)
coefs <- ldply(regressions, coef)
Origin (Intercept) EngineSize
1 USA 33.13666 37.29919
2 non-USA 15.68747 55.39211
EDIT. 编辑。 For your example, substitute
PUF
for Cars93
, state
for Origin
and fm
for the formula 例如,用
PUF
代替Cars93
,用state
代替Origin
,用fm
代替公式
I've cleaned up your code slightly: 我已经稍微整理了一下代码:
fm <- z ~ class1+class2+class3+class4+class5+class6+class7+
xtot+e00200+e00300+e00600+e00900+e01000+p04470+e04800+
e09600+e07180+e07220+e07260+e06500+e10300+
e59720+e11900+e18425+e18450+e18500+e19700
PUFsplit <- split(PUF, PUF$state)
mod <- lapply(PUFsplit, function(z) lm(fm, data=z))
Beta <- sapply(mod, coef)
If you wanted, you could even put this all in one line: 如果需要,您甚至可以将所有内容放在一行中:
Beta <- sapply(lapply(split(PUF, PUF$state), function(z) lm(fm, data=z)), coef)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.