简体   繁体   English

R通过lapply命令从乘法回归中提取回归系数

[英]R extract regression coefficients from multiply regression via lapply command

I have a large dataset with several variables, one of which is a state variable, coded 1-50 for each state. 我有一个包含多个变量的大型数据集,其中一个是状态变量,每个状态编码为1-50。 I'd like to run a regression of 28 variables on the remaining 27 variables of the dataset (there are 55 variables total), and specific for each state. 我想对数据集的其余27个变量(总共有55个变量)进行28个变量的回归,并针对每个状态进行回归。

In other words, run a regression of variable1 on covariate1, covariate2, ..., covariate27 for observations where state==1. 换句话说,对state == 1的观测值在covariate1,covariate2,...,covariate27上进行变量1的回归。 I'd then like to repeat this for variable1 for states 2-50, and the repeat the whole process for variable2, variable3,..., variable28. 然后,我想针对状态2-50的变量1重复此操作,并针对变量2,变量3,...,变量28重复整个过程。

I think I've written the correct R code to do this, but the next thing I'd like to do is extract the coefficients, ideally into a coefficient matrix. 我想我已经编写了正确的R代码来执行此操作,但是接下来我想做的是提取系数,理想情况下是将其提取到系数矩阵中。 Could someone please help me with this? 有人可以帮我吗? Here's the code I've written so far: 这是我到目前为止编写的代码:

for (num in 1:50) {

    #PUF is the data set I'm using

    #Subset the data by states
    PUFnum <- subset(PUF, state==num)

    #Attach data set with state specific data
    attach(PUFnum)

    #Run our prediction regression
    #the variables class1 through e19700 are the 27 covariates I want to use
    regression <- lapply(PUFnum,  function(z) lm(z ~ class1+class2+class3+class4+class5+class6+class7+
                                                     xtot+e00200+e00300+e00600+e00900+e01000+p04470+e04800+
                                                     e09600+e07180+e07220+e07260+e06500+e10300+
                                                     e59720+e11900+e18425+e18450+e18500+e19700))

    Beta <- lapply(regression, function(d) d<- coef(regression$d))


    detach(PUFnum)
}

This is another example of the classic Split-Apply-Combine problem, which can be addressed using the plyr package by @hadley. 这是经典的“ Split-Apply-Combine问题的另一个示例,可以使用plyr软件包来解决。 In your problem, you want to 在您的问题中,您想

  1. Split data frame by state 按状态分割数据帧
  2. Apply regressions for each subset 对每个子集应用回归
  3. Combine coefficients into data frame. 将系数合并到数据帧中。

I will illustrate it with the Cars93 dataset available in MASS library. 我将通过MASS库中可用的Cars93数据集进行说明。 We are interested in figuring out the relationship between horsepower and enginesize based on origin of country. 我们感兴趣的是找出关系horsepowerenginesize根据origin国。

# LOAD LIBRARIES
require(MASS); require(plyr)

# SPLIT-APPLY-COMBINE
regressions <- dlply(Cars93, .(Origin), lm, formula = Horsepower ~ EngineSize)
coefs <- ldply(regressions, coef)

   Origin (Intercept) EngineSize
1     USA    33.13666   37.29919
2 non-USA    15.68747   55.39211

EDIT. 编辑。 For your example, substitute PUF for Cars93 , state for Origin and fm for the formula 例如,用PUF代替Cars93 ,用state代替Origin ,用fm代替公式

I've cleaned up your code slightly: 我已经稍微整理了一下代码:

fm <- z ~ class1+class2+class3+class4+class5+class6+class7+
          xtot+e00200+e00300+e00600+e00900+e01000+p04470+e04800+
          e09600+e07180+e07220+e07260+e06500+e10300+
          e59720+e11900+e18425+e18450+e18500+e19700

PUFsplit <- split(PUF, PUF$state)
mod <- lapply(PUFsplit, function(z) lm(fm, data=z))

Beta <- sapply(mod, coef)

If you wanted, you could even put this all in one line: 如果需要,您甚至可以将所有内容放在一行中:

Beta <- sapply(lapply(split(PUF, PUF$state), function(z) lm(fm, data=z)), coef)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM