简体   繁体   English

使用 lm() 和 svyglm() 在 R 中进行加权线性回归。 相同的模型,不同的结果

[英]Weighted linear regression in R with lm() and svyglm(). Same model, different results

I want to do a linear regression applying survey weights in R studio.我想在 R studio 中应用调查权重进行线性回归。 I have seen that it is possible to do this with the lm() function, which enables me to specify the weights I want to use.我已经看到可以使用lm()函数来做到这一点,这使我能够指定我想要使用的权重。 However, it is also possible to do this with the svyglm() function, which does the regression with variables in a survey design object which has been weighted by the desired variable.但是,也可以使用svyglm()函数执行此操作,该函数对调查设计对象中的变量进行回归,该对象已由所需变量加权。

In theory, I see no reason for the results of these two regression models to be different, and the beta estimates are the same.从理论上讲,我认为这两种回归模型的结果没有任何不同的原因,并且 beta 估计值是相同的。 However, the standard errors in each model are different, leading to different p-values and therefore to different levels of significance.然而,每个模型中的标准误差是不同的,导致不同的 p 值,从而导致不同的显着性水平。

Which model is the most appropriate one?哪种型号最合适? Any help would be greatly appreciated.任何帮助将不胜感激。

Here is the R code:这是R代码:

dat <- read.csv("https://raw.githubusercontent.com/LucasTremlett/questions/master/questiondata.csv")
model.weighted1 <-  lm(DV~IV1+IV2+IV3, data=dat, weights = weight)
summary(model.weighted1)
dat.weighted<- svydesign(ids = ~1, data = dat, weights = dat$weight)
model.weighted2<- svyglm(DV~IV1+IV2+IV3, design=dat.weighted)
summary(model.weighted2)

Mostly to confirm what is in the comments already:主要是为了确认评论中的内容:

  • lm and svyglm will always give the same point estimates, but will typically give different standard errors. lmsvyglm将始终给出相同的点估计,但通常会给出不同的标准误差。 In the terminologyI use here , and which @BenBolker already links (Thanks!) , lm assumes precision weights and svyglm assumes sampling weights我在这里使用的术语中,@BenBolker 已经链接(谢谢!)lm假设精确权重, svyglm假设采样权重
  • For that particular survey data set, you have sampling weights and want svyglm对于那个特定的调查数据集,您有抽样权重并需要svyglm
  • From the description of the survey you'd expect also to have a stratum variable, but it looks as though they don't supply it.根据调查的描述,您还希望有一个分层变量,但看起来他们似乎不提供它。 If they did, it would go into svydesign and would be used to reduce the standard errors in svyglm如果他们这样做了,它将进入svydesign并用于减少svyglm的标准误差

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM