简体   繁体   中英

Factorial Anova in R

I am trouble understanding summary of factorial anova in R. I don't understand why I am getting Df of 2 for only the first variable. A,B,C and D all have 3 levels so in my understanding I should get 2 Df for those and interaction of those. Please help me to fix the code or understand the results.

PS Where can I find the list of options for summary()? I saw one example that removed the * after sig level and I want to see what options I have.

Thank you in advance

Here is Data I have

Complete data set I have

  Runs IABCD AB E AD BC FGHJK B1 B2 y 1 1 1 -1 -1 -1 -1 1 1 1 1 1 1 -1 -1 -1 -1 1 190.9 2 2 1 1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 -1 -1 436.2 3 3 1 -1 1 -1 -1 -1 1 1 -1 -1 1 1 1 -1 1 -1 480.3 4 4 1 1 1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 1 406.3 5 5 1 -1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 1 -1 212.9 6 6 1 1 -1 1 -1 -1 1 -1 -1 1 -1 -1 1 -1 1 1 478.7 7 7 1 -1 1 1 -1 -1 -1 1 1 -1 -1 -1 1 1 -1 1 396.5 8 8 1 1 1 1 -1 1 1 -1 1 -1 -1 1 -1 -1 -1 -1 349.7 9 9 1 -1 -1 -1 1 1 1 -1 1 -1 -1 -1 1 1 1 -1 119.7 10 10 1 1 -1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1 1 1 372.2 11 11 1 -1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 1 -1 1 411.6 12 12 1 1 1 -1 1 1 -1 1 -1 1 -1 -1 1 -1 -1 -1 382.8 13 13 1 -1 -1 1 1 1 -1 -1 -1 -1 1 1 1 -1 -1 1 161.2 14 14 1 1 -1 1 1 -1 1 1 -1 -1 1 -1 -1 1 -1 -1 424.3 15 15 1 -1 1 1 1 -1 -1 -1 1 1 1 -1 -1 -1 1 -1 322.8 16 16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 302.1 17 17 1 0 0 0 0 0 0 0 0 0 0 0 -1 1 0 0 302.4 18 18 1 0 0 0 0 0 0 0 0 0 0 0 1 -1 0 0 318.2 19 19 1 0 0 0 0 0 0 0 0 0 0 0 -1 1 0 0 332.8 > data ###Factors > A [1] -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 0 0 0 Levels: -1 0 1 > B [1] -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 0 0 0 Levels: -1 0 1 > C [1] -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 1 1 0 0 0 Levels: -1 0 1 > D [1] -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 0 0 0 Levels: -1 0 1 ####Response variable > data$y [1] 190.9 436.2 480.3 406.3 212.9 478.7 396.5 349.7 119.7 372.2 411.6 382.8 161.2 424.3 322.8 302.1 302.4 318.2 [19] 332.8 A=as.factor(data$A) B=as.factor(data$B) C=as.factor(data$C) D=as.factor(data$D) out3=lm(data$y~C+B+A+D) fit1=aov(out3) summary(fit1) > summary(fit1) Df Sum Sq Mean Sq F value Pr(>F) C 2 2743 1372 0.170 0.8456 B 1 26896 26896 3.332 0.0910 . A 1 45839 45839 5.679 0.0331 * D 1 12928 12928 1.602 0.2279 Residuals 13 104934 8072 

Same anova with different order of variable

summary(fit1) Df Sum Sq Mean Sq F value Pr(>F)
B 2 28199 14100 1.747 0.2129
A 1 45839 45839 5.679 0.0331 * D 1 12928 12928 1.602 0.2279
C 1 1440 1440 0.178 0.6796
Residuals 13 104934 8072

If I conduct anova with only 2 levels(exclude 0 for all variables, and use [1:16] data only since last 3 data are based on "0" level ), then it comes out fine. I get Df of 1 for all var but residuals.

I was trying and thinking and thinking and saying how could this be possible that the degrees of freedom are not calculated correctly? But sometimes we only think about complicated things and forget about the easy things. I found what the problem is:

data <- read.table(header=T,text='Runs I  A  B  C  D AB  E AD BC  F  G  H  J  K B1 B2     y
1     1 1 -1 -1 -1 -1  1  1  1  1  1  1 -1 -1 -1 -1  1 190.9
2     2 1  1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1 -1 -1 436.2
3     3 1 -1  1 -1 -1 -1  1  1 -1 -1  1  1  1 -1  1 -1 480.3
4     4 1  1  1 -1 -1  1 -1 -1 -1 -1  1 -1 -1  1  1  1 406.3
5     5 1 -1 -1  1 -1  1 -1  1 -1  1 -1  1 -1  1  1 -1 212.9
6     6 1  1 -1  1 -1 -1  1 -1 -1  1 -1 -1  1 -1  1  1 478.7
7     7 1 -1  1  1 -1 -1 -1  1  1 -1 -1 -1  1  1 -1  1 396.5
8     8 1  1  1  1 -1  1  1 -1  1 -1 -1  1 -1 -1 -1 -1 349.7
9     9 1 -1 -1 -1  1  1  1 -1  1 -1 -1 -1  1  1  1 -1 119.7
10   10 1  1 -1 -1  1 -1 -1  1  1 -1 -1  1 -1 -1  1  1 372.2
11   11 1 -1  1 -1  1 -1  1 -1 -1  1 -1  1 -1  1 -1  1 411.6
12   12 1  1  1 -1  1  1 -1  1 -1  1 -1 -1  1 -1 -1 -1 382.8
13   13 1 -1 -1  1  1  1 -1 -1 -1 -1  1  1  1 -1 -1  1 161.2
14   14 1  1 -1  1  1 -1  1  1 -1 -1  1 -1 -1  1 -1 -1 424.3
15   15 1 -1  1  1  1 -1 -1 -1  1  1  1 -1 -1 -1  1 -1 322.8
16   16 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 302.1
17   17 1  0  0  0  0  0  0  0  0  0  0  0 -1  1  0  0 302.4
18   18 1  0  0  0  0  0  0  0  0  0  0  0  1 -1  0  0 318.2
19   19 1  0  0  0  0  0  0  0  0  0  0  0 -1  1  0  0 332.8')

a.dummies <- model.matrix(~A)
b.dummies <- model.matrix(~B)
c.dummies <- model.matrix(~C)
d.dummies <- model.matrix(~D)


a<-cbind(a.dummies[,-1],b.dummies[,-1])
b<-cbind(c.dummies[,-1],d.dummies[,-1])
all<-cbind(a,b)

I took the liberty to create the dummies on my own to check them one by one. And the problem revealed itself. Simple correlation table:

cor(all)

           A0         A1         B0         B1         C0         C1         D0         D1
A0  1.0000000 -0.3692745  1.0000000 -0.3692745  1.0000000 -0.3692745  1.0000000 -0.3692745
A1 -0.3692745  1.0000000 -0.3692745  0.1363636 -0.3692745  0.1363636 -0.3692745  0.1363636
B0  1.0000000 -0.3692745  1.0000000 -0.3692745  1.0000000 -0.3692745  1.0000000 -0.3692745
B1 -0.3692745  0.1363636 -0.3692745  1.0000000 -0.3692745  0.1363636 -0.3692745  0.1363636
C0  1.0000000 -0.3692745  1.0000000 -0.3692745  1.0000000 -0.3692745  1.0000000 -0.3692745
C1 -0.3692745  0.1363636 -0.3692745  0.1363636 -0.3692745  1.0000000 -0.3692745  0.1363636
D0  1.0000000 -0.3692745  1.0000000 -0.3692745  1.0000000 -0.3692745  1.0000000 -0.3692745
D1 -0.3692745  0.1363636 -0.3692745  0.1363636 -0.3692745  0.1363636 -0.3692745  1.0000000

The way the lm function works (and many more model functions) is to eliminate one of two variables that have a correlation of exactly 1 ie remove duplicate columns. In your case C0 has a correlation of 1 against A0, B0 and D0 so those 3 were removed from the model effectively reducing the number of levels of your factors to 2 for A,B and D. Therefore, the degrees of freedom are now 1 for A, B and D.

Mystery solved!!!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM