R-在兩個數據幀中找到匹配列以進行t檢驗統計（R初學者）

Question

我想對R中的數據執行雙樣本t檢驗。給定兩個高維數據框，我需要以某種方式遍歷所有行的匹配列（標題中的String colnames（））並執行測試每列對 - 分別來自df1和df2。 問題是數據框中的列不是正確的順序，即col1形式df1與df2中的col1不匹配，而df2具有df1中不存在的其他列。 我從來沒有使用R來完成這些任務，我想知道是否有一個快速而方便的解決方案，可以在數據幀中找到匹配的列對進行t檢驗。

我考慮過for循環，但我認為這對於大型數據幀來說效率非常低。

預先感謝您的任何幫助。

* EDITED -------兩個小的示例數據幀，df1和df2 --------------------------------

**** **** DF1

"Row\Column"    "A2"    "A1"    "A4"    "A3"
"id_1"           10      20      0       40
"id_2"           5       15      25      35
"id_3"           8       0       12      16
"id_4"           17      25      0       40

**** **** DF2

"Row\Column"    "A3"    "A8"    "A5"    "A6"    "A1"    "A7"    "A4"    "A2"
"id_1"           0       2       0       4       0       1       2       3
"id_2"           1       5       8       3       4       5       6       7
"id_3"           2       10      6       9       8       9       10      11
"id_4"           7       2       10      2       55      0       0       0
"id_5"           0       1       0       0       9       1       3       4
"id_6"           8       0       1       2       7       2       3       0

匹配列只是df1中的列名與df2中的列名匹配。 例如，df1和df2中的兩個匹配列是例如“A1”和“A1”，“A2”和“A2”......你明白了......

Answer 1

mapply是你正在尋找的功能。
如果你的data.frame的列匹配，你可以簡單地使用

mapply(t.test, df1, df2)

但是，由於它們沒有，您需要確定df1哪一列與df2哪一列一致。 幸運的是， R中的索引選項很聰明，如果您輸入列名稱的向量（集合），您將按給定的順序返回這些列。 這讓生活變得輕松。

# find the matching names
## this will give you those names in df1 that are also in df2
## and *only* such names (ie, strict intersect)
matchingNames <- names(df1)[names(df1) %in% names(df2)]

請注意， matchingNames有一些順序現在看看當你使用matchingNames向量作為每個df1和df2的列的索引時會發生什么（還要注意列順序）

df1[, matchingNames]
df2[, matchingNames]
matchingNames

因此，我們現在有兩個data.frames具有正確匹配的列，我們可以使用它們進行mapply 。

## mapply will apply a function to each data.frame, one pair of columns at a time

## The first argument to `mapply` is your function, in this example, `t.test`
## The second and third arguments are the data.frames (or lists) to simultaneously iterate over
mapply(t.test, df1[, matchingNames], df2[, matchingNames])

Answer 2

沒有可重復的例子，很難給你一個好的答案。 您還需要通過matching列來定義您的意思。

這里有2個data.frames的例子，它們有一些共同的列名。

df1 <- matrix(sample(1:100,5*5,rep=TRUE),ncol=5,nrow=5)
df2 <- matrix(sample(1:100,5*8,rep=TRUE),ncol=8,nrow=5)
colnames(df1) <- letters[6:10]
colnames(df2) <- rev(letters[1:8])

然后我定義了t.test的包裝器，以限制例如p值的輸出和自由度。

f <- function(x,y){
  test <- t.test(x,y)
  data.frame(df   = test$parameter,
                    pval = test$p.value)
}

然后使用sapply迭代我使用intersect常見列

sapply(intersect(colnames(df1),colnames(df2)), 
                 function(x) f(df1[,x], df2[,x]))

     f         g         h        
df   7.85416   6.800044  7.508915 
pval 0.5792354 0.2225824 0.4392895

R-在兩個數據幀中找到匹配列以進行t檢驗統計（R初學者）

問題描述

2 個解決方案

解決方案1
4 2013-04-07 17:13:13

解決方案2
0 已采納 2013-04-07 17:20:31

R-在兩個數據幀中找到匹配列以進行t檢驗統計（R初學者）

問題描述

2 個解決方案

解決方案1 4 2013-04-07 17:13:13

解決方案2 0 已采納 2013-04-07 17:20:31

解決方案1
4 2013-04-07 17:13:13

解決方案2
0 已采納 2013-04-07 17:20:31