简体   繁体   English

校正后的成对比较如何产生 p.value < 然后是单个 t.test?

[英]how is it that a corrected pairwise comparison yields a p.value < then a single t.test?

Hi suppose I have these results嗨,假设我有这些结果

df <- structure(list(len = c(4.2, 11.5, 7.3, 5.8, 6.4, 10, 11.2, 11.2, 
5.2, 7, 15.2, 21.5, 17.6, 9.7, 14.5, 10, 8.2, 9.4, 16.5, 9.7, 
16.5, 16.5, 15.2, 17.3, 22.5, 17.3, 13.6, 14.5, 18.8, 15.5, 19.7, 
23.3, 23.6, 26.4, 20, 25.2, 25.8, 21.2, 14.5, 27.3, 23.6, 18.5, 
33.9, 25.5, 26.4, 32.5, 26.7, 21.5, 23.3, 29.5, 25.5, 26.4, 22.4, 
24.5, 24.8, 30.9, 26.4, 27.3, 29.4, 23), supp = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("OJ", 
"VC"), class = "factor"), dose = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("D0.5", "D1", "D2"
), class = "factor")), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 10L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 
40L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 41L, 42L, 
43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 21L, 22L, 23L, 24L, 25L, 
26L, 27L, 28L, 29L, 30L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 
59L, 60L), class = "data.frame") 

df$int <- interaction(df$supp, df$dose)
e <- pairwise.t.test(df$len, df$int, p.adjust.method="BH")

so from the output所以从 output

        OJ.D0.5          VC.D0.5            OJ.D1     VC.D1            OJ.D2  
VC.D0.5 0.00285          -                  -         -                -      
OJ.D1   0.00000079391014 0.00000000000984   -         -                -      
VC.D1   0.04207          0.00000243821908 **0.00088** -                -      
OJ.D2   0.00000000042891 0.00000000000001   0.04645   0.00000089414918 -      
VC.D2   0.00000000042891 0.00000000000001   0.04474   0.00000085310153 0.96089

the comparison of, VC.D1 vs OJ.D1 = 0.00088 VC.D1 与 OJ.D1 的比较 = 0.00088

however a single t.test但是单个 t.test

t.test(df[df$supp == "VC" & df$dose == "D1", ]$len, 
       df[df$supp == "OJ" & df$dose == "D1", ]$len)

yields a p.value = p-value = 0.001038产生 p.value = p-value = 0.001038

so I most have messed up somewhere because shouldn't an adjusted p value be greater than a single uncorrected p value?所以我大多数人在某个地方搞砸了,因为调整后的 p 值不应该大于单个未纠正的 p 值吗?

Solution解决方案

You'll get the same results when you set p.adjust.method = "none" and pool.sd = FALSE :当您设置p.adjust.method = "none"pool.sd = FALSE时,您将获得相同的结果:

pairwise.t.test(df$len, df$int, p.adjust.method = "none", pool.sd = FALSE)$p.value[3,3]
# 0.001038376

t.test(df[df$supp == "VC" & df$dose == "D1", ]$len, 
       df[df$supp == "OJ" & df$dose == "D1", ]$len)$p.value
# 0.001038376

Notes笔记

  1. Just a reminder to always carefully read documentation and perform some sanity checks, to make sure the function does what you think it does.只是提醒您始终仔细阅读文档并执行一些完整性检查,以确保 function 符合您的想法。
  2. This only illustrates where the difference comes from.这仅说明了差异的来源。 How to run it in your case will have to depend on your familiarity with the data.如何在您的情况下运行它必须取决于您对数据的熟悉程度。

Explanation解释

The comparison becomes much easier when we simply don't apply multiple testing correction.当我们根本不应用多重测试校正时,比较变得容易得多。 In that case, they should have the same p-value, right?在那种情况下,它们应该具有相同的 p 值,对吧? So let's compare using p.adjust.method = "none" .因此,让我们使用p.adjust.method = "none"进行比较。 When running pairwise.t.test we now get 0.00059 ... closer, but still not right.运行pairwise.t.test时,我们现在得到0.00059 ... 更接近,但仍然不对。

The problem stems from the pool.sd argument.问题源于pool.sd参数。 This forces the use of a common standard deviation across all comparisons.这会强制在所有比较中使用共同的标准偏差。 This is useful in general (if the assumption is met), but does lead to different p-values.这通常很有用(如果满足假设),但确实会导致不同的 p 值。

When we look at the underlying code, this becomes clear:当我们查看底层代码时,这变得很清楚:

if (pool.sd) {
        METHOD <- "t tests with pooled SD"
        xbar <- tapply(x, g, mean, na.rm = TRUE)
        s <- tapply(x, g, sd, na.rm = TRUE)
        n <- tapply(!is.na(x), g, sum)
        degf <- n - 1
        total.degf <- sum(degf)
        pooled.sd <- sqrt(sum(s^2 * degf)/total.degf)
        compare.levels <- function(i, j) {
            dif <- xbar[i] - xbar[j]
            se.dif <- pooled.sd * sqrt(1/n[i] + 1/n[j])
            t.val <- dif/se.dif
            if (alternative == "two.sided") 
                2 * pt(-abs(t.val), total.degf)
            else pt(t.val, total.degf, lower.tail = (alternative == 
                "less"))
        }
    }

Amongst others, a total degrees of freedom is calculated across the tests ( total.degf ) which is then used to calculate a pooled standard deviation ( pooled.sd ).其中,计算整个测试的总自由度 ( total.degf ),然后用于计算合并标准偏差 ( pooled.sd )。

when we set pool.sd = FALSE , the code simply uses the t.test function:当我们设置pool.sd = FALSE时,代码仅使用t.test function:

    else {
        METHOD <- if (paired) 
            "paired t tests"
        else "t tests with non-pooled SD"
        compare.levels <- function(i, j) {
            xi <- x[as.integer(g) == i]
            xj <- x[as.integer(g) == j]
            t.test(xi, xj, paired = paired, alternative = alternative, 
                ...)$p.value
        }
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 stat_compare_means()提供的p.value与compare_means()或t.test()不同 - stat_compare_means() gives different p.value than compare_means() or t.test() 系数和相关的成对组合p.value? - pairwise combinations of coefficient and correlation p.value? 如何从cor.test()中提取p.value和进行估算? - How to extract the p.value and estimate from cor.test()? 如何反转 t.test() 的组比较顺序? - How to reverse the group comparison order for t.test()? 如何使用相同的参考向量在R中执行多个成对的t.test? - How can I perform multiple pairwise t.test in R using the same reference vector? 如何在 R 中跨多个独立向量执行成对 t.test? - How can I perform a pairwise t.test in R across multiple independent vectors? 如何从chisq.test()函数获取p.value? - how may I get a p.value from chisq.test() function? 在 Rstudio 中执行 Shapiro-Wilk 测试后,如何仅 select p.value &gt; 0.05? - How to select only the p.value >0.05 after performing Shapiro-Wilk test in Rstudio? 如何使用 r 中的 shapiro.test 计算具有 NA 值的数据框中每列的 p.value? - How to calculate p.value of each column in a data frame with NA values using shapiro.test in r? 如何从data.frame中的cor.test()中提取p.value并进行估计? - How to extract the p.value and estimate from cor.test() in a data.frame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM