R中的ANOVA：自由度几乎都等于1

Question

This is for my honors thesis! 这是我的荣誉论文！ My advisor doesn't know how to use R and I don't know how to use anything else, so here I am. 我的顾问不知道如何使用R，我不知道如何使用其他任何东西，所以我在这里。

I have a data set that begins like this: 我有一个数据集，开头像这样：

> d.weight
    R   N   P  C D.weight
1   1   0   0 GO     45.3
2   2   0   0 GO     34.0
3   3   0   0 GO     19.1
4   4   0   0 GO     26.6
5   5   0   0 GO     23.5
6   1  45   0 GO     22.1
7   2  45   0 GO     15.5
8   3  45   0 GO     23.4
9   4  45   0 GO     15.8
10  5  45   0 GO     42.9
...

and so on. 等等。

R is rep and there are 5 of them (1-5). R是rep，其中有5个（1-5）。
N is nitrogen level, and there are 5 as well (0, 45, 90, 180, 360). N是氮水平，也有5（0,45,90,180,360）。
P is phosphorus level, and there are 5 as well (0, 35, 70, 140, 280). P是磷水平，也有5（0,35,70,140,280）。
C is plant combination, and there are 4 (GO, GB, LO, LB). C是植物组合，有4种（GO，GB，LO，LB）。
D.weight is dry weight in grams. D.weight是以克为单位的干重。

However, when I do an ANOVA I get the wrong degrees of freedom. 但是，当我进行方差分析时，我得到了错误的自由度。 I usually run my ANOVAs on subsets of that full set of data, but let's just do an analysis I wouldn't actually do otherwise, just so you can see that almost all of the Df are wrong. 我通常在那组完整数据的子集上运行我的ANOVA，但是我只是做一个我实际上不会做的分析，只是这样你就可以看到几乎所有的Df都是错误的。

> example.aov=aov(D.weight ~ R+N+P+C, data=d.weight)
> summary(example.aov)
         Df Sum Sq Mean Sq F value  Pr(>F)    
R             1   1158    1158   9.484 0.00226 ** 
N             1    202     202   1.657 0.19900    
P             1  11040   11040  90.408 < 2e-16 ***
C             3  41032   13677 112.010 < 2e-16 ***
Residuals   313  38220     122

So, basically, the only one that's right is the C factor. 所以，基本上，唯一正确的是C因子。 Is it because it has letters instead of numbers? 是因为它有字母而不是数字吗？

I found somewhere that if I write interaction() with each term, I get the right Df, but I don't know if that's the right thing to do overall. 我找到了某个地方，如果我用每个术语写下interaction（），我会得到正确的Df，但我不知道这是否是正确的做法。 For example: 例如：

> example.aov2=aov(D.weight ~ interaction(R)+interaction(N)+interaction(P)+interaction(C), data=d.weight)
> summary(example.aov2)
                Df Sum Sq Mean Sq F value   Pr(>F)    
interaction(R)   4   7423    1856  19.544 2.51e-14 ***
interaction(N)   4    543     136   1.429    0.224    
interaction(P)   4  13788    3447  36.301  < 2e-16 ***
interaction(C)   3  41032   13677 144.042  < 2e-16 ***
Residuals      304  28866      95

I tried it with the C factor only to see if it messed up anything: 我用C因子尝试了它只是为了看它是否搞砸了什么：

> example.aov3=aov(D.weight ~ C, data=d.weight)
> summary(example.aov3)
             Df Sum Sq Mean Sq F value Pr(>F)    
C             3  41032   13677   85.38 <2e-16 ***
Residuals   316  50620     160                   
> 
> example.aov4=aov(D.weight ~ interaction(C), data=d.weight)
> summary(example.aov4)
                Df Sum Sq Mean Sq F value Pr(>F)    
interaction(C)   3  41032   13677   85.38 <2e-16 ***
Residuals      316  50620     160

And it looks the same. 它看起来一样。 Should I be adding interaction() everywhere? 我应该在任何地方添加交互（）吗？

Thanks for the help! 谢谢您的帮助！

Answer 1

R determines whether it should treat variables as categorical (ANOVA-type analysis) or continuous (regression-type analysis) by checking whether they are numeric or factor variables. R通过检查变量是numeric还是factor变量来确定是将变量视为分类（ANOVA类型分析）还是连续（回归类型分析）。 Most simply, you can convert your predictor (independent) variables to factors via 最简单的说，您可以将预测变量（独立）变量转换为因子

facs <- c("R","N","P")
d_weight[facs] <- lapply(d.weight[facs],factor)

If you want to create auxiliary variables instead of overwriting you could do something like 如果你想创建辅助变量而不是覆盖你可以做类似的事情

for (varname in facs) {
   d_weight[[paste0("f",varname)]] <- factor(d_weight[[varname]])
}

There might be a more compact way to do it but that should serve ... 可能有一种更紧凑的方式来做到这一点，但应该服务......

R中的ANOVA：自由度几乎都等于1

问题描述

1 个解决方案

解决方案1
5 已采纳 2014-10-13 16:37:11

R中的ANOVA：自由度几乎都等于1

问题描述

1 个解决方案

解决方案1 5 已采纳 2014-10-13 16:37:11

解决方案1
5 已采纳 2014-10-13 16:37:11