[英]How can I perform a pairwise t.test in R across multiple independent vectors?

I have vectors X1,X2,X3,...Xn.我有向量 X1,X2,X3,...Xn。 I want to test to see whether the average value for any one vector is significantly different than the average value for any other vector, for every possible combination of vectors.我想测试一下,对于每种可能的向量组合,任何一个向量的平均值是否与任何其他向量的平均值显着不同。 I am seeking a better way to do this in R than running n^2 individual t.tests.我正在寻找一种比运行 n^2 个单独的 t.tests 更好的方法在 R 中做到这一点。

I have a data frame full of census data for a particular CSA.我有一个包含特定 CSA 的人口普查数据的数据框。 Each row contains observations for each variable (column) for a particular census tract.每行包含特定人口普查区域的每个变量(列)的观测值。

What I need to do is compare means for the same variable across census tracts in different MSAs.我需要做的是比较不同 MSA 中不同人口普查区域的相同变量的均值。 In other words, I want to factor my data.frame according to the MSA designation variable (which is one of the columns) and then compare the differences in the means for another variable of interest pairwise across each newly-factored MSA.换句话说,我想根据 MSA 指定变量(它是列之一)分解我的 data.frame,然后在每个新分解的 MSA 中成对比较另一个感兴趣变量的均值差异。 This is essentially doing pairwise t.tests across each ensuing vector, but I wish to do this in a more elegant way than writing t.test(MSAx, MSAy) over and over again.这本质上是对每个随后的向量进行成对的 t.tests,但我希望以一种比一遍又一遍地编写 t.test(MSAx, MSAy) 更优雅的方式来做到这一点。 How can I do this?我怎样才能做到这一点?

The advantage to my method below to the one proposed by @ashkan would be that mine removes duplicates.下面我的方法比@ashkan 提出的方法的优点是我的方法可以删除重复项。 (ie either X1 vs X2 OR X2 vs X1 will appear in the results, not both) (即 X1 vs X2 或 X2 vs X1 将出现在结果中,而不是同时出现)

# Generate dummy data
df <- data.frame(matrix(rnorm(100), ncol = 10))
colnames(df) <- paste0("X", 1:10)

# Create combinations of the variables
combinations <- combn(colnames(df),2, simplify = FALSE)

# Do the t.test
results <- lapply(seq_along(combinations), function (n) {
                  df <- df[,colnames(df) %in% unlist(combinations[n])]
                  result <- t.test(df[,1], df[,2])

# Rename list for legibility    
names(results) <- paste(matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,1], matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,2], sep = " vs. ")

Just use pairwise.t.test , here is an example:只需使用pairwise.t.test ,这是一个例子:

x1 <- rnorm(50)
x2 <- rnorm(30, mean=0.2)
x3 <- rnorm(100,mean=0.1)
x4 <- rnorm(100,mean=0.4)

x <- data.frame(data=c(x1,x2,x3,x4),
                  rep("x1", length(x1)),
                  rep("x2", length(x2)),
                  rep("x3", length(x3)),
                  rep("x4", length(x4))) )


#   Pairwise comparisons using t tests with non-pooled SD 
# data:  x$data and x$key 
#    x1     x2     x3    
# x2 0.7395 -      -     
# x3 0.9633 0.9633 -     
# x4 0.0067 0.9633 0.0121
# P value adjustment method: holm 

If you have a data.frame and you wish to independently perform T-tests between each column of the data.frame, you can use a double apply loop:如果您有一个 data.frame 并且您希望在 data.frame 的每一列之间独立执行 T 检验,您可以使用双重应用循环:

apply(MSA, 2, function(x1) {
  apply(MSA, 2, function(x2) {
    t.test(x1, x2)

A good visualization to accompany such a brute force approach would be a forest plot:伴随这种蛮力方法的良好可视化将是森林图:

cis <- apply(MSA, 2, function(x) mean(x) + c(-1, 1) * sd(x) * 1.96)
plot.window(xlim=c(1, ncol(cis)), ylim=range(cis))
segments(1:ncol(cis), cis[1, ], 1:ncol(cis), cis[2, ])
axis(1, at=1:ncol(cis), labels=colnames(MSA))
abline(h=mean(MSA), lty='dashed')
title('Forest plot of 95% confidence intervals of MSA')

In addition to response from quarzgar, there are another method to perform pairwise ttest across multiple factors in R. Basically is a trick for the two (or more) factors used by creating a combination of factor levels.除了来自 quarzgar 的响应之外,还有另一种方法可以在 R 中跨多个因子执行成对 ttest。基本上是通过创建因子水平组合使用的两个(或多个)因子的技巧。

Example with a 2x2 classical design: 2x2 经典设计示例:

df <- data.frame(Id=c(rep(1:100,2),rep(101:200,2)),

summary(aov(dv~Group*Condition+Error(Id/Condition),data = df))

#post-hoc across all factors
df$posthoclevels <- paste(df$Group,df$Condition) #factor combination

#   Pairwise comparisons using t tests with pooled SD 
# data:  df$dv and df$posthoclevels 
#                 Control Post Control Pre Experimental Post
# Control Pre       0.60         -           -                
# Experimental Post <2e-16       <2e-16      -                
# Experimental Pre  0.26         0.47        <2e-16           
# P value adjustment method: holm 

