简体   繁体   English

使用 R 中的 Survey 包的单向方差分析

[英]One-way anova using the Survey package in R

I am trying to identify the best way to run a one-way Anova on a complex survey design .我正在尝试确定在复杂调查设计上运行单向 Anova的最佳方法。 After perusing Lumley's Survey package documentation , I am none the wiser.在阅读了Lumley 的调查包文档后,我一点也不聪明。

The survey::anova function is meant to 'Fit and compare hierarchical loglinear models for complex survey data', which is not what I am doing.调查::anova 函数旨在“拟合和比较复杂调查数据的分层对数线性模型”,这不是我正在做的。

What I am trying to do I have collected data about one categorical independent variable [3 levels] and one quantitative dependent variable.我正在尝试做的事情是我收集了关于一个分类自变量 [3 个级别] 和一个定量因变量的数据。 I want to use ANOVA to check if the dependent variable changes according to the level of the independent variable.我想使用方差分析来检查因变量是否根据自变量的水平而变化。

Here is an example of my process:这是我的过程的一个例子:

Load Survey package and create complex survey design object加载测量包并创建复杂的测量设计对象

library(survey)

df <- data.frame(sex = c('F', 'O', NA, 'M', 'M', 'O', 'F', 'F'),
                 married = c(1,1,1,1,0,0,1,1),
                 pens = c(0, 1, 1, NA, 1, 1, 0, 0),
                 weight = c(1.12, 0.55, 1.1, 0.6, 0.23, 0.23, 0.66, 0.67))

svy_design <- svydesign(ids=~1, data=df, weights=~weight)

Borrowing from this post over here ,借用这里的这篇文章

Method 1: using survey::aov方法1:使用survey::aov

summary(aov(weight~sex,data = svy_design))

However I got an error saying:但是我收到一条错误消息:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'object' in selecting a method for function 'summary': object 'api00' not foun

Method 2: use survey::glm instead of anova方法2:使用survey::glm代替anova

That same post has an answer/explanation with a case against using anova:同一篇文章有​​一个反对使用方差分析的答案/解释:

According to the main statistician of our institute there is no easy implementation of this kind of analysis in any common modeling environment.根据我们研究所的主要统计学家的说法,在任何常见的建模环境中都不容易实现这种分析。 The reason for that is that ANOVA and ANCOVA are linear models that where not further developed after the emergence of General Linear Models (later Generalized linear models - GLMs) in the 70's.原因是 ANOVA 和 ANCOVA 是线性模型,在 70 年代通用线性模型(后来的广义线性模型 - GLM)出现后没有进一步发展。 A normal linear regression model yields practically the same results as an ANOVA, but is much more flexible regarding variable choice.正常的线性回归模型产生与 ANOVA 几乎相同的结果,但在变量选择方面更加灵活。 Since weighting methods exist for GLMs (see survey package in R) there is no real need to develop methods to weight for stratified sampling design in ANOVA... simply use a GLM instead.由于 GLM 存在加权方法(参见 R 中的调查包),因此没有真正需要为 ANOVA 中的分层抽样设计开发加权方法......只需使用 GLM 代替。

summary(svyglm(weight~sex,svy_design))

I got this output:我得到了这个输出:

call:
svyglm(formula = weight ~ sex, design = svy_design)

Survey design:
svydesign(ids = ~1, data = df, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   0.8730     0.1478   5.905  0.00412 **
sexM         -0.3756     0.1855  -2.024  0.11292   
sexO         -0.4174     0.1788  -2.334  0.07989 . 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.04270091)

Number of Fisher Scoring iterations: 2

My Questions:我的问题:

  1. Why does method 1 throw an error?为什么方法1会抛出错误?
  2. Is it possible to use the survey::aov function accomplish my goal?是否可以使用survey::aov 函数来实现我的目标?
  3. If I were to use survey::glm [method 2], which value should I be looking at to identify a difference in means?如果我要使用survey::glm [方法2],我应该查看哪个值来确定均值的差异? Would it be the p value of the intercept?会是截距的p值吗?

I am a far cry from a stats buff, please do explain in the simplest possible terms.我与统计爱好者相去甚远,请用最简单的术语解释一下。 Thank you!!谢谢!!

There is no such function as survey::aov , so you can't use it to accomplish your goal.没有像survey::aov这样的功能,所以你不能用它来实现你的目标。 Your code uses stats::aov您的代码使用stats::aov

You can use survey::svyglm .您可以使用survey::svyglm I will use one of the examples from the package, so I can actually run the code我将使用包中的一个示例,因此我可以实际运行代码

> model<-svyglm(api00~stype, design=dclus2)
> summary(model)

Call:
svyglm(formula = api00 ~ stype, design = dclus2)

Survey design:
dclus2<-svydesign(id=~dnum+snum, weights=~pw, data=apiclus2)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   692.81      30.28  22.878  < 2e-16 ***
stypeH        -94.47      27.66  -3.415  0.00156 ** 
stypeM        -50.46      23.01  -2.193  0.03466 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 17528.44)

Number of Fisher Scoring iterations: 2

There are three school types, E , M , and H .共有三种学校类型, EMH The two coefficients here estimate differences between the mean of E and the means of the other two groups and the $p$-values test the hypotheses that H and E have the same mean and that M and E have the same mean.这里的两个系数估计E的均值与其他两组的均值之间的差异,$p$ 值检验HE具有相同均值以及ME具有相同均值的假设。

If you want an overall test for the difference in means among the three groups you can use the regTermTest function, which tests a term or set of terms in the model, eg,如果您想要对三组之间的均值差异进行整体测试,您可以使用regTermTest函数,该函数测试模型中的一个术语或一组术语,例如,

> regTermTest(model,~stype)
Wald test for stype
 in svyglm(formula = api00 ~ stype, design = dclus2)
F =  12.5997  on  2  and  37  df: p= 6.7095e-05 

That F test is analogous to the one stats::aov gives.该 F 测试类似于stats::aov给出的一个。 It's not identical, because this is survey data不相同,因为这是调查数据

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM