[英]Correct use of sapply with Anova on multiple subsets in R
I am trying to run a two-way ANOVA on multiple subsets of a data frame without having to actually subset the data as this is in-efficient 我正在尝试对数据帧的多个子集运行双向方差分析,而不必实际对数据进行子集处理,因为这效率低下
Example data: 示例数据:
DF<-structure(list(Sample = c(666L, 676L, 686L, 667L, 677L, 687L,
822L, 832L, 842L, 824L, 834L, 844L), Time = c(300L, 300L, 300L,
300L, 300L, 300L, 400L, 400L, 400L, 400L, 400L, 400L), Ploidy = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2n",
"3n"), class = "factor"), Tissue = c("muscle", "muscle", "muscle",
"liver", "liver", "liver", "intestine", "intestine", "intestine",
"gill", "gill", "gill"), X.lipid = c(1.1, 0.8, 1.3, 3.7, 3.9,
3.8, 5.2, 3.4, 6, 7.6, 10.4, 6.7), l.dec = c(0.011, 0.008, 0.013,
0.037, 0.039, 0.038, 0.052, 0.034, 0.06, 0.076, 0.104, 0.067),
l.arc = c(0.105074124512229, 0.0895624074394449, 0.114266036973812,
0.193560218793138, 0.19879088899975, 0.196192082631721, 0.230059118691331,
0.185452088760136, 0.247467063170448, 0.279298057669285,
0.328359182374352, 0.261824790465914)), .Names = c("Sample",
"Time", "Ploidy", "Tissue", "X.lipid", "l.dec", "l.arc"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 69L, 70L, 71L, 72L, 73L, 74L), class = "data.frame")
Coming across similar examples: Anova, for loop to apply function and ANOVA on multiple responses, by multiple groups NOT part of formula 遇到类似的例子: 方差分析 ,for循环,将函数和方差分析应用于多个响应,由多个组组成,不是公式的一部分
I can get close but I do not believe this is correct as it uses aov, as opposed to anova 我可以接近,但我不认为这是正确的,因为它使用aov而不是anova
x<- unique(DF$Tissue)
sapply(x, function(my) {
f <- as.formula(paste("l.dec~Time*Ploidy"))
aov(f, data=DF)
}, simplify=FALSE)
If i switch aov for anova, it returns an error message: 如果我将aov切换为anova,则会返回错误消息:
Error in UseMethod("anova") :
no applicable method for 'anova' applied to an object of class "formula"
Long way around but which is CORRECT is as follows: 很长的路要走,但这是正确的,如下所示:
#Subset by each Tissue type (just one here for e.g.)
muscle<- subset (DF, Tissue == "muscle")
#Perform Anova
anova(lm(l.dec ~ Ploidy * Time, data = muscle))
However In the main data frame I have many tissue types and want to avoid performing this subset. 但是,在主数据帧中,我有许多组织类型,并希望避免执行此子集。
I believe the apply formula is close but need help on the final stages. 我认为申请方法很接近,但在最后阶段需要帮助。
Building on @user20650 and my comments above, I would suggest first using sapply
with lm
to generate your list of models, and then use sapply
again on that list to generate your ANOVA tables. 基于@ user20650和我上面的评论,我建议首先将
sapply
与lm
一起使用来生成您的模型列表,然后再在该列表上再次使用sapply
来生成ANOVA表。 That way the list of models will be available to you so you can get coefficients, fitted values, residuals etc etc. 这样一来,您可以使用模型列表,从而可以获取系数,拟合值,残差等。
x <- unique(DF$Tissue)
models <- sapply(x, function(my) {
lm(l.dec ~ Time * Ploidy, data=DF, Tissue==my)
}, simplify=FALSE)
ANOVA.tables <- sapply(models, anova, simplify=FALSE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.