通过循环迭代单向方差分析会在R中引发错误

Question

I'm trying to loop through a large dataframe [5413 columns] and run an ANOVA on each column, however I'm getting an error when trying to do so. 我试图遍历大型数据框[5413列]，并在每列上运行ANOVA，但是尝试这样做时出现错误。

I'd like to have the P value from the ANOVA written to a new row in a dataframe containing the column titles. 我想将ANOVA中的P值写入包含列标题的数据框中的新行。 But limited my current knowledge I'm writing the P-value outputs to files I can parse through in bash. 但是由于我目前的知识有限，我正在将P值输出写入可以在bash中解析的文件中。

Here's an example layout of the data: 这是数据的示例布局：

data()
Name, Group, aaaA, aaaE, bbbR, cccD
Apple, Fruit, 1.23, 0.45, 0.3, 1.1
Banana, Fruit, 0.54, 0.12, 2.0, 1.32
Carrot, Vegetable, 0.01, 0.05, 0.45, 0.9
Pear, Fruit, 0.1, 0.2, 0.1, 0.3
Fox, Animal, 1.0, 0.9, 1.2, 0.8
Dog, Animal, 1.2, 1.1, 0.8, 0.7

And here is the output from dput: 这是dput的输出：

structure(list(Name = structure(c(1L, 2L, 3L, 6L, 5L, 4L), .Label = c("Apple", 
"Banana", "Carrot", "Dog", "Fox", "Pear"), class = "factor"), 
    Group = structure(c(2L, 2L, 3L, 2L, 1L, 1L), .Label = c(" Animal", 
    " Fruit", " Vegetable"), class = "factor"), aaaA = c(1.23, 
    0.54, 0.01, 0.1, 1, 1.2), aaaE = c(0.45, 0.12, 0.05, 0.2, 
    0.9, 1.1), bbbR = c(0.3, 2, 0.45, 0.1, 1.2, 0.8), cccD = c(1.1, 
    1.32, 0.9, 0.3, 0.8, 0.7)), class = "data.frame", row.names = c(NA, 
-6L))

To get a successful output from one I do: 为了获得成功的输出，我要做：

summary(aov(aaaA ~ Group, data=data))[[1]][["Pr(>F)"]]

I then try to implement that in a loop: 然后，我尝试在一个循环中实现它：

for(i in names(data[3:6])){
out <- summary(aov(i ~ Group, data=data))[[1]][["Pr(>F)"]]
write.csv(out, i)}

Which returns the error: 返回错误：

Error in model.frame.default(formula = i ~ Group, data = test, drop.unused.levels = TRUE) : 
variable lengths differ (found for 'Group')

Can anyone help with getting around the error or implementing a per-column ANOVA? 任何人都可以帮助解决错误或实施每列方差分析吗？

Answer 1

We can do the following and later get the p values: 我们可以执行以下操作，然后获取p值：

to_use<-setdiff(names(df),"aaaA")
lapply(to_use,function(x) summary(do.call(aov,list(as.formula(paste("aaaA","~",x)),
                                           data=df))))

This gives you: 这给您：

[[1]]
            Df Sum Sq Mean Sq
Name         5   1.48   0.296

[[2]]
            Df Sum Sq Mean Sq F value Pr(>F)
Group        2 0.8113  0.4057   1.819  0.304
Residuals    3 0.6689  0.2230               

[[3]]
            Df Sum Sq Mean Sq F value Pr(>F)  
aaaE         1 0.9286  0.9286   6.733 0.0604 .
Residuals    4 0.5516  0.1379                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

[[4]]
            Df Sum Sq Mean Sq F value Pr(>F)
bbbR         1  0.043  0.0430    0.12  0.747
Residuals    4  1.437  0.3593               

[[5]]
            Df Sum Sq Mean Sq F value Pr(>F)
cccD         1 0.1129  0.1129    0.33  0.596
Residuals    4 1.3673  0.3418

通过循环迭代单向方差分析会在R中引发错误

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-06-12 12:26:01

通过循环迭代单向方差分析会在R中引发错误

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-06-12 12:26:01

解决方案1
0 已采纳 2019-06-12 12:26:01