简体   繁体   English

在R w / PLYR中提取组回归系数

[英]Extract Group Regression Coefficients in R w/ PLYR

I'm trying to run a regression for every zipcode in my dataset and save the coefficients to a data frame but I'm having trouble. 我正在尝试为我的数据集中的每个邮政编码运行回归并将系数保存到数据框但我遇到了麻烦。

Whenever I run the code below, I get a data frame called "coefficients" containing every zip code but with the intercept and coefficient for every zipcode being equal to the results of the simple regression lm(Sealed$hhincome ~ Sealed$square_footage) . 每当我运行下面的代码时,我得到一个称为“系数”的数据框,其中包含每个邮政编码,但每个邮政编码的截距和系数等于简单回归lm(Sealed$hhincome ~ Sealed$square_footage)的结果lm(Sealed$hhincome ~ Sealed$square_footage)

When I run the code as indicated in Ranmath's example at the link below, everything works as expected. 当我在下面的链接中运行Ranmath示例中指示的代码时,一切都按预期工作。 I'm new to R after many years with STATA, so any help would be greatly appreciated :) 经过STATA多年后我是R的新手,所以任何帮助都会非常感激:)

R extract regression coefficients from multiply regression via lapply command R通过lapply命令从多次回归中提取回归系数

library(plyr)
Sealed <- read.csv("~/Desktop/SEALED.csv")

x <- function(df) {
      lm(Sealed$hhincome ~ Sealed$square_footage)
}

regressions <- dlply(Sealed, .(Sealed$zipcode), x)
coefficients <- ldply(regressions, coef)

Because dlply takes a ... argument that allows additional arguments to be passed to the function, you can make things even simpler: 因为dlply采用了一个允许将额外参数传递给函数的...参数,所以你可以使事情变得更简单:

dlply(Sealed,.(zipcode),lm,formula=hhincome~square_footage)

The first two arguments to lm are formula and data . lm的前两个参数是formuladata Since formula is specified here, lm will pick up the next argument it is given (the relevant zipcode-specific chunk of Sealed ) as the data argument ... 由于此处指定了formula ,因此lm将获取它给出的下一个参数(相关的特定于邮政编码的Sealed块)作为data参数...

You are applying the function: 您正在应用该功能:

x <- function(df) {
      lm(Sealed$hhincome ~ Sealed$square_footage)
}

to each subset of your data, so we shouldn't be surprised that the output each time is exactly 对于数据的每个子集,因此我们不应对每次输出的确切结果感到惊讶

lm(Sealed$hhincome ~ Sealed$square_footage)

right? 对? Try replacing Sealed with df inside your function. 尝试用函数中的df替换Sealed That way you're referring to the variables in each individual piece passed to the function, not the whole variable in the data frame Sealed . 这样你就是指传递给函数的每个单独变量中的变量,而不是Sealed数据框中的整个变量。

The issue is not with plyr but rather in the definition of the function. 问题不plyr ,而在于函数的定义。 You are calling a function, but not doing anything with the variable. 你正在调用一个函数,但没有对变量做任何事情。

As an analogy, 作为类比,

 myFun <- function(x) {
    3 * 7
  }

    >  myFun(2)
    [1] 21
    >  myFun(578)
    [1] 21

If you run this function on different values of x, it will still give you 21, no matter what x is. 如果你在x的不同值上运行这个函数,它仍然会给你21,无论x是什么。 That is, there is no reference to x within the function. 也就是说,函数中没有x的引用。 In my silly example, the correction is obvious; 在我愚蠢的例子中,修正是显而易见的; in your function above, the confusion is understandable. 在你上面的函数中,混淆是可以理解的。 The $hhincome and $square_footage should conceivably serve as variables. $hhincome$square_footage应该可以作为变量。

But you want your x to vary over what comes before the $ . 但是你希望你的x在$ 之前变化。 As @Joran correctly pointed out, swap sealed$hhincome with df$hhincome (and same for $squ.. ) and that will help. 正如@Joran正确指出的那样,交换sealed$hhincomedf$hhincome (同样为$squ.. )这将有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM