[英]R- find matching columns in two data frames for t-test statistics (R beginner)
I would like to perform a two-sample t-test on my data within R. Given two high-dimensional data frames, I need to somehow loop through matching columns (String colnames() in header) over all rows and perform the test for each column pair - one from df1 and df2, respectively. 我想对R中的数据执行双样本t检验。给定两个高维数据框,我需要以某种方式遍历所有行的匹配列(标题中的String colnames())并执行测试每列对 - 分别来自df1和df2。 The problem is that the columns from the data frames are not in right order, ie col1 form df1 doesn't match col1 from df2, and df2 has additional columns that don't exist in df1.
问题是数据框中的列不是正确的顺序,即col1形式df1与df2中的col1不匹配,而df2具有df1中不存在的其他列。 I've never used R for such tasks and I wonder if there is a fast and handy solution to find matching column pairs in the data frames for the t-test.
我从来没有使用R来完成这些任务,我想知道是否有一个快速而方便的解决方案,可以在数据帧中找到匹配的列对进行t检验。
I thought about for-loops but I think this would be very inefficient for large data frames. 我考虑过for循环,但我认为这对于大型数据帧来说效率非常低。
Thank you in advance for any help. 预先感谢您的任何帮助。
*EDITED-------Two small example dataframes, df1 and df2-------------------------------- * EDITED -------两个小的示例数据帧,df1和df2 --------------------------------
****df1**** **** **** DF1
"Row\Column" "A2" "A1" "A4" "A3"
"id_1" 10 20 0 40
"id_2" 5 15 25 35
"id_3" 8 0 12 16
"id_4" 17 25 0 40
****df2**** **** **** DF2
"Row\Column" "A3" "A8" "A5" "A6" "A1" "A7" "A4" "A2"
"id_1" 0 2 0 4 0 1 2 3
"id_2" 1 5 8 3 4 5 6 7
"id_3" 2 10 6 9 8 9 10 11
"id_4" 7 2 10 2 55 0 0 0
"id_5" 0 1 0 0 9 1 3 4
"id_6" 8 0 1 2 7 2 3 0
Matching columns are nothing but the columns names in df1 matching with the columnsnames in df2. 匹配列只是df1中的列名与df2中的列名匹配。 For example Two matching columns in df1 and df2 are eg "A1" and "A1", "A2" and "A2" ... you get the idea...
例如,df1和df2中的两个匹配列是例如“A1”和“A1”,“A2”和“A2”......你明白了......
mapply
is the function you are looking for. mapply
是你正在寻找的功能。
if the columns of your data.frame
s matched up, you could simply use 如果你的
data.frame
的列匹配,你可以简单地使用
mapply(t.test, df1, df2)
However, since they do not, you somehow need to identify which column of df1
goes with which column of df2
. 但是,由于它们没有,您需要确定
df1
哪一列与df2
哪一列一致。 Fortunately, the indexing options in R
are clever, and if you feed in a vector ( a collection ) of column names, you will get back those columns in the order given. 幸运的是,
R
中的索引选项很聪明,如果您输入列名称的向量( 集合 ),您将按给定的顺序返回这些列。 This makes life easy. 这让生活变得轻松。
# find the matching names
## this will give you those names in df1 that are also in df2
## and *only* such names (ie, strict intersect)
matchingNames <- names(df1)[names(df1) %in% names(df2)]
Notice that matchingNames
has some order to it Now look what happens when you use the matchingNames
vector as an index to the columns of each of df1 and df2 (note also the column order) 请注意,
matchingNames
有一些顺序现在看看当你使用matchingNames
向量作为每个df1和df2的列的索引时会发生什么(还要注意列顺序)
df1[, matchingNames]
df2[, matchingNames]
matchingNames
Therefore, we now have two data.frames with properly matched columns, which we can use to mapply
over. 因此,我们现在有两个data.frames具有正确匹配的列,我们可以使用它们进行
mapply
。
## mapply will apply a function to each data.frame, one pair of columns at a time
## The first argument to `mapply` is your function, in this example, `t.test`
## The second and third arguments are the data.frames (or lists) to simultaneously iterate over
mapply(t.test, df1[, matchingNames], df2[, matchingNames])
Very hard to give you a good answer without a reproducible example. 没有可重复的例子,很难给你一个好的答案。 You need to define also what do you mean by
matching
columns. 您还需要通过
matching
列来定义您的意思。
Here an example of 2 data.frames that have some columns names in common. 这里有2个data.frames的例子,它们有一些共同的列名。
df1 <- matrix(sample(1:100,5*5,rep=TRUE),ncol=5,nrow=5)
df2 <- matrix(sample(1:100,5*8,rep=TRUE),ncol=8,nrow=5)
colnames(df1) <- letters[6:10]
colnames(df2) <- rev(letters[1:8])
Then I define a wrapper of t.test
, to limit for example the ouput to the p-values and the degree of freedom. 然后我定义了
t.test
的包装器,以限制例如p值的输出和自由度。
f <- function(x,y){
test <- t.test(x,y)
data.frame(df = test$parameter,
pval = test$p.value)
}
Then using sapply
I iterate over common columns that I get using intersect
然后使用
sapply
迭代我使用intersect
常见列
sapply(intersect(colnames(df1),colnames(df2)),
function(x) f(df1[,x], df2[,x]))
f g h
df 7.85416 6.800044 7.508915
pval 0.5792354 0.2225824 0.4392895
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.