简体   繁体   English

R 中 wilcox_test() 和 wilcox.test() 之间的不同 P 值

[英]Different P-values between wilcox_test() & wilcox.test() in R

While I was trying to perform a paired Wilcoxon test for the variable:"Metabolite", I noticed different P-values between wilcox_test() & wilcox.test(), when I tested the following variable:当我尝试对变量执行配对 Wilcoxon 测试时:“代谢物”,当我测试以下变量时,我注意到 wilcox_test() 和 wilcox.test() 之间的 P 值不同:

structure(list(Visit = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), .Label = c("BSL", "LV"), class = "factor"), Metabolite = c(NA, 
9.602, 9.0102, 4.524, 3.75, NA, 6.596, 7.065, 6.877, NA, NA, 
10.1846, 13.521, 7.8219, NA, 4.9149, 4.0754, 4.7635, 8.8554, 
4.3442, NA, 16.659, NA, 3.698, 6.623, 5.158, 11.719, 3.206, NA, 
2.225, 7.417, 1.42, NA, NA, 2.752, 6.504, 7.594, 6.652, NA, NA, 
3.784, 2.7311, 4.1749, 2.6659, 0.5592, NA, 4.2326, 4.3808, 3.624, 
4.29, 7.098, 6.532, 3.699, 9.297, 8.275, NA)), class = "data.frame", row.names = c(NA, 
-56L))



# The P-value derived from result 1 (p=0.0079) is different from that from result 2 (p=0.003279):


        result1 <- wilcox_test(data=Data_pairs,  Metabolite~Visit, paired = TRUE)
        
        # A tibble: 1 x 7
        #.y.     group1 group2    n1    n2 statistic      p
        #* <chr>   <chr>  <chr>  <int> <int>     <dbl>  <dbl>
        #        1 Metabolite BSL    LV        28    28       131 0.0079
        
        
        result2 <- wilcox.test(data=Data_pairs,  Metabolite~Visit, paired = TRUE)
        #   Wilcoxon signed rank exact test
        
        #data:  Metabolite by Visit
        #V = 197, p-value = 0.003279
        #alternative hypothesis: true location shift is not equal to 0
 

Using different statistical software, it seems that the wrong P-value is that derived from result2.使用不同的统计软件,似乎错误的 P 值是从 result2 得出的。

Is there any suggestion/advice on how to correct my code, and what is the reason for this difference?关于如何更正我的代码有什么建议/建议,这种差异的原因是什么?

Thank you in advance for your help.预先感谢您的帮助。

You should point out you are using rstatix to avoid the confusion:您应该指出您正在使用 rstatix 以避免混淆:

rstatix::wilcox_test(data=Data_pairs,  Metabolite~Visit, paired = TRUE)
# A tibble: 1 x 7
  .y.        group1 group2    n1    n2 statistic      p
* <chr>      <chr>  <chr>  <int> <int>     <dbl>  <dbl>
1 Metabolite BSL    LV        28    28       131 0.0079

If it is indeed paired data, then you need to ensure you provide the complete pairs, taking your data, we can place them side by side, assuming the your observations 1 to 28 and from the same sample as 29 to 56:如果确实是配对数据,那么您需要确保您提供完整的配对,获取您的数据,我们可以将它们并排放置,假设您的观察值 1 到 28 并且来自与 29 到 56 相同的样本:

da = do.call(cbind,split(Data_pairs$Metabolite,Data_pairs$Visit))
da

          BSL     LV
 [1,]      NA     NA
 [2,]  9.6020 2.2250
 [3,]  9.0102 7.4170
 [4,]  4.5240 1.4200
 [5,]  3.7500     NA
 [6,]      NA     NA
 [7,]  6.5960 2.7520
 [8,]  7.0650 6.5040
 [9,]  6.8770 7.5940
[10,]      NA 6.6520
[11,]      NA     NA
[12,] 10.1846     NA
[13,] 13.5210 3.7840
[14,]  7.8219 2.7311
[15,]      NA 4.1749
[16,]  4.9149 2.6659
[17,]  4.0754 0.5592
[18,]  4.7635     NA
[19,]  8.8554 4.2326
[20,]  4.3442 4.3808
[21,]      NA 3.6240
[22,] 16.6590 4.2900
[23,]      NA 7.0980
[24,]  3.6980 6.5320
[25,]  6.6230 3.6990
[26,]  5.1580 9.2970
[27,] 11.7190 8.2750
[28,]  3.2060     NA

We test only the complete:我们只测试完整的:

wilcox.test(da[complete.cases(da),1],da[complete.cases(da),2],paired=TRUE)

data:  da[complete.cases(da), 1] and da[complete.cases(da), 2]
V = 131, p-value = 0.007904
alternative hypothesis: true location shift is not equal to 0

There's a few things going on here.这里发生了一些事情。 The most important is you're not doing a paired test with coin::wilcox_test (assuming you're using the coin package - we can't be sure from your question).最重要的是您没有使用coin::wilcox_test进行配对测试(假设您使用的是coin package - 我们无法从您的问题中确定)。 There is no paired argument for the function. function 没有配对参数。 If we compare coin::wilcox_test with the exact distribution and the unpaired wilcox.test , we'll get the same result.如果我们将coin::wilcox_test与精确分布和未配对的wilcox.test进行比较,我们将得到相同的结果。 There will also be the same result with the argument paired = TRUE supplied to coin::wilcox_test .提供给coin::wilcox_test的参数paired = TRUE也会有相同的结果。

coin::wilcox_test(data = Data_pairs,  Metabolite ~ Visit, distribution = "exact")
#> 
#>  Exact Wilcoxon-Mann-Whitney Test
#> 
#> data:  Metabolite by Visit (BSL, LV)
#> Z = 2.503, p-value = 0.01167
#> alternative hypothesis: true mu is not equal to 0
wilcox.test(data = Data_pairs, Metabolite ~ Visit)
#> 
#>  Wilcoxon rank sum exact test
#> 
#> data:  Metabolite by Visit
#> W = 320, p-value = 0.01167
#> alternative hypothesis: true location shift is not equal to 0
coin::wilcox_test(data = Data_pairs,  Metabolite ~ Visit, paired = TRUE, distribution = "exact")
#> 
#>  Exact Wilcoxon-Mann-Whitney Test
#> 
#> data:  Metabolite by Visit (BSL, LV)
#> Z = 2.503, p-value = 0.01167
#> alternative hypothesis: true mu is not equal to 0

However, the p-values with default options will not be the same between the functions regardless.但是,无论如何,具有默认选项的 p 值在函数之间不会相同。 The p-values derived from coin::wilcox_test are asymptotic and computed from permutation tests.coin::wilcox_test派生的 p 值是渐近的,并通过置换测试计算得出。 The p-values from wilcox.test are exact p-values.来自wilcox.test的 p 值是精确的 p 值。 You can get exact p-values from coin::wilcox_test by specifying the distribution = "exact" as I did above.您可以通过像我上面所做的那样指定distribution = "exact"coin::wilcox_test获得精确的 p 值。 Comparing the default options, you'll see they differ.比较默认选项,您会发现它们不同。

coin::wilcox_test(data = Data_pairs,  Metabolite ~ Visit)
#> 
#>  Asymptotic Wilcoxon-Mann-Whitney Test
#> 
#> data:  Metabolite by Visit (BSL, LV)
#> Z = 2.503, p-value = 0.01231
#> alternative hypothesis: true mu is not equal to 0
wilcox.test(data = Data_pairs,  Metabolite ~ Visit)
#> 
#>  Wilcoxon rank sum exact test
#> 
#> data:  Metabolite by Visit
#> W = 320, p-value = 0.01167
#> alternative hypothesis: true location shift is not equal to 0

I would spend some time looking through the functions with ?coin::wilcox_test or ?wilcox.test and making sure you understand what test you want to apply and what results they supply.我会花一些时间查看?coin::wilcox_test?wilcox.test的函数,并确保您了解要应用的测试以及它们提供的结果。

To maintain reproducibility you should always use set.seed(1) in your R code.为了保持可重复性,您应该始终在 R 代码中使用set.seed(1) Maybe this is the case?也许是这样?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM