在R中，如何使用一个表来定义要在另一表中用于双向ANOVA的列？

Question

I have two tables, m and epi. 我有两个桌子，M和Epi。 The epi table contains names of m columns . Epi表包含m列的名称。

  head(m[,1:6])
         Geno    11DPW      8266         80647        146207    146227
1 SB002XSB012 0.87181895    G/G           C/C          G/G        A/A
2 SB002XSB018         Na    G/G           C/T          G/G        A/A
3 SB002XSB044   1.057744    G/G           C/C          G/G        A/A
4 SB002XSB051 1.64736814    G/G           C/C          G/G        A/A
5 SB002XSB067 0.69987475    A/G           C/C          G/G        A/G
6 SB002XSB073 0.60552177    A/G           C/C          G/G        A/G

    > dim(m)

[1]   167 28234
and 
head(epi)
       SNP1      SNP2
1  7789543   12846898
2 12846898  7789543
3 24862913  4603896
4  4603896   24862913
5 50592569  7789543
6 27293494   57162585

    dim(epi)

[1] 561   2

I want to take each row of epi, and to do a tow-way anova of these 2 columns in m on the 11DPW in m. 我要拍摄Epi的每一行，并在m的11DPW上对m的这两列进行拖拉方差分析。 I tried 我试过了

f<-function (x) {
 anova(lm (as.numeric(m$"11DPW")~ m[,epi[x,1]]*m[,epi[x,2]]))
 }
apply(epi,1,f)

and got error : Error in [.data.frame (m, , epi[x, 1]) : undefined columns selected Any suggestions ? 并得到错误： [.data.frame （m，，epi [x，1]）中的错误：选择了未定义的列有什么建议吗？ Thanks, Imri 谢谢，伊姆里

Answer 1

Putting aside for a moment the complications from using integers as column names (that is, assuming that this issue is handled correctly) 暂时搁置使用整数作为列名的复杂性（也就是说，假设此问题已正确处理）

You will still get the `"undefined columns selected"` error if the column indicated in `epi` does not exist in `m` 如果`epi`中指示的列在`m`中不存在，则仍然会出现`"undefined columns selected"`了`"undefined columns selected"`错误

offendingElements <- !sapply(epi, "%in%", colnames(m))

# since an offending element likely disqualifies the row from the anova test, identify the whole row
offendingRows <- which(offendingElements) %% nrow(epi)   

# perform your apply statement over:
epi[-offendingRows, ]

CLEANING UP THE FUNCTION USED IN APPLY 清理应用程序中使用的功能

when you use apply(epi, 1, f) what you are passing to each call of f is an entire row of epi . 当您使用apply(epi, 1, f)时，传递给f每次调用的都是整行epi 。 Therefore, epi[x, 1] is not giving you the results you want. 因此， epi[x, 1]不能给您想要的结果。 For example, on the 7th iteration of the apply statement x is the equivalent of epi[7, ] . 例如，在apply语句的第7次迭代中， x等于epi[7, ] 。 Therefore to get the first column, you just need to index x directly. 因此，要获得第一列，您只需直接索引x 。 Therefore, in your function: 因此，在您的函数中：

Instead of       epi[x, 1]   and    epi[x, 2]
You want to use  x[[1]]      and    x[[2]]

That is the first part. 这是第一部分。 Second, we need to deal with integers as column names. 其次，我们需要将整数用作列名。 VERY IMPORTANT: If you use m[, 7823] this will get you the 7823rd column of m. 非常重要：如果使用m [，7823]，则将获得m的7823列。 You have to be sure to convert the integers to strings, indicating that you want the column NAMED "7823", NOT (neceessarilly) the 7823rd column. 您必须确保将整数转换为字符串，这表示您希望将列命名为“ 7823”，而不是7823rd列（neceessarilly）。

Use as.character for this: 为此使用as.character ：

   m[, as.character(x[[1]])]

PUTTING IT ALL TOGETHER 全部放在一起

offendingElements <- !sapply(epi, "%in%", colnames(m))
offendingRows <- which(offendingElements) %% nrow(epi)   

apply(epi[-offendingRows, ], 1, function (x) 
   anova( lm ( as.numeric(m$"11DPW") ~ m[, as.character(x[[1]]) ] * m[, as.character(x[[2]]) ] ))
)

There is an alternative way to dealing with the names, the simplest would be to make them appropriate strings 有一种处理名称的替代方法，最简单的方法是使它们成为适当的字符串

# clean up the elements in epi
epi.clean <- sapply(epi, make.names)

# clean up m's column names
colnames(m) <- make.names(colnames(m))

# use epi.clean  in your apply statement.  Dont forget offendingRows
apply(epi.clean[-offendingRows, ], 1, function (x) 
   anova( lm ( as.numeric(m$"11DPW") ~ m[, x[[1]] ] * m[, x[[2]] ] ))
)

Answer 2

I suspect your values in epi are numbers, but what you want to use are their character equivalents, since the column names in m are character strings (even though these strings are made up of numerals). 我怀疑epi中的值是数字，但是您要使用的是它们的等价字符，因为m中的列名称是字符串（即使这些字符串由数字组成）。 Try this instead: 尝试以下方法：

m[[as.character(epi[x,])]] (etc) m[[as.character(epi[x,])]] （等）

The [[ operator is quirky but very cool. [[运算符很古怪，但非常酷。

在R中，如何使用一个表来定义要在另一表中用于双向ANOVA的列？

问题描述

2 个解决方案

解决方案1
1 2012-12-17 19:35:52

You will still get the `"undefined columns selected"` error if the column indicated in `epi` does not exist in `m` 如果`epi`中指示的列在`m`中不存在，则仍然会出现`"undefined columns selected"`了`"undefined columns selected"`错误

CLEANING UP THE FUNCTION USED IN APPLY 清理应用程序中使用的功能

PUTTING IT ALL TOGETHER 全部放在一起

解决方案2
0 2012-12-17 13:34:27

在R中，如何使用一个表来定义要在另一表中用于双向ANOVA的列？

问题描述

2 个解决方案

解决方案1 1 2012-12-17 19:35:52

You will still get the "undefined columns selected" error if the column indicated in epi does not exist in m 如果epi中指示的列在m中不存在，则仍然会出现"undefined columns selected"了"undefined columns selected"错误

CLEANING UP THE FUNCTION USED IN APPLY 清理应用程序中使用的功能

PUTTING IT ALL TOGETHER 全部放在一起

解决方案2 0 2012-12-17 13:34:27

解决方案1
1 2012-12-17 19:35:52

You will still get the `"undefined columns selected"` error if the column indicated in `epi` does not exist in `m` 如果`epi`中指示的列在`m`中不存在，则仍然会出现`"undefined columns selected"`了`"undefined columns selected"`错误

解决方案2
0 2012-12-17 13:34:27