简体   繁体   中英

Strange adjusted (pseudo) p-values when running pairwise adonis

I really need some input on an R bioinformatics issue. I suspect this might be a case of me not completely understanding the adjustment step when doing a pairwise anova (using this wrapper https://github.com/pmartinezarbizu/pairwiseAdonis ). I've been googling around for a while now, and have not found any answer, so next step is trying here.

I've been using pairwise.adonis as a post hoc test after running the adonis test from the package vegan.

I've come across something I find a little strange, and I'm not sure if it's a bug, or if I've done something wrong, or if the data is just weird like that - any inputs would be much appreciated.

When I run the pairwise.adonis like this:

post_hoc_permanova <- pairwise.adonis(t(otu), meta$Fungicide_treatment, sim.function = "vegdist", 
sim.method = "bray", p.adjust.m = "fdr", reduce = NULL, perm = 999) 

I get this (sorry if the paste is a bit messy, not sure how to make it look pretty here):

post_hoc_permanova 

pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
Bentonite vs Esquive 1 0.4309946 2.1228348 0.17511043 0.059 0.2400000
Bentonite vs Tessior 1 0.2354402 1.1522300 0.10331835 0.266 0.3855556
Bentonite vs Control water 1 0.2028768 1.0137438 0.09204353 0.346 0.3855556
Bentonite vs Copper oxychloride 1 0.1778650 1.0765919 0.09719523 0.347 0.3855556
Esquive vs Tessior 1 0.4197328 1.8764408 0.15799690 0.068 0.2400000
Esquive vs Control water 1 0.3258340 1.4845894 0.12926796 0.187 0.3740000
Esquive vs Copper oxychloride 1 0.4070595 2.2055388 0.18069983 0.072 0.2400000
Tessior vs Control water 1 0.2583331 1.1700713 0.10475057 0.288 0.3855556
Tessior vs Copper oxychloride 1 0.3041160 1.6361852 0.14061182 0.113 0.2825000
Control water vs Copper oxychloride 1 0.1205611 0.6636605 0.06223571 0.594 0.594000

I get all different p-values, but when they're adjusted, suddenly a lot of them have exactly the same adjusted values (egpadjusted sig = 0.2400000 for both p.value=0.059, 0.068 and 0.072). I'm at a loss to explain why this happens - if adjusting by any standard, the values would still be slightly different, depending on the "original" p-value, wouldn't they? If anyone can enlighten me, I'm all ears.

My dataset looks like this (full tables available here: https://github.com/Marieag/LeaSyBiome ):

head(meta)
              
SampleID Real_Sample_Name Year Location Cultivar Fungicide_treatment 
GF.ITS.VL31 GF.ITS.VL31             <NA> 2022   Lisbon    Syrah           Bentonite 
GF.ITS.VL32 GF.ITS.VL32             <NA> 2022   Lisbon    Syrah           Bentonite 
GF.ITS.VL33 GF.ITS.VL33             <NA> 2022   Lisbon    Syrah           Bentonite 
GF.ITS.VL34 GF.ITS.VL34             <NA> 2022   Lisbon    Syrah           Bentonite 
GF.ITS.VL35 GF.ITS.VL35             <NA> 2022   Lisbon    Syrah           Bentonite 
GF.ITS.VL36 GF.ITS.VL36             <NA> 2022   Lisbon    Syrah           Bentonite
head(otu)
                    
GF.ITS.VL31  GF.ITS.VL32 GF.ITS.VL33  GF.ITS.VL34  GF.ITS.VL35 GF.ITS.VL36 GF.ITS.VL37  GF.ITS.VL38 
g__Kondoa            0.000000000 0.0000000000 0.000000000 0.0000000000 0.0000000000           0 0.000000000 0.0000000000 
g__Sarocladium       0.000000000 0.0000000000 0.000000000 0.0000000000 0.0000000000           0 0.000000000 0.0000000000 
g__Symmetrospora_1   0.001181818 0.0000000000 0.000000000 0.0000000000 0.0000000000           0 0.000000000 0.0000000000 
o__Capnodiales       0.000000000 0.0006363636 0.004272727 0.0006363636 0.0004545455           0 0.001363636 0.0008181818 
s__Candida_cretensis 0.000000000 0.0000000000 0.000000000 0.0000000000 0.0000000000           0 0.000000000 0.0000000000 
c__Leotiomycetes     0.000000000 0.0000000000 0.001454545 0.0086363636 0.0008181818           0 0.001363636 0.0012727273 
GF.ITS.VL39  GF.ITS.VL40  GF.ITS.VL41  GF.ITS.VL42  GF.ITS.VL43  GF.ITS.VL44  GF.ITS.VL45 
g__Kondoa            0.000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.000000e+00 0.000000e+00 
g__Sarocladium       0.000000000 0.0000000000 0.0000000000 0.0025454545 0.0000000000 0.000000e+00 0.000000e+00 
g__Symmetrospora_1   0.000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 9.090909e-05 0.000000e+00 
o__Capnodiales       0.000000000 0.0005454545 0.0004545455 0.0002727273 0.0042727273 5.181818e-03 9.090909e-05 
s__Candida_cretensis 0.000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.000000e+00 0.000000e+00 
c__Leotiomycetes     0.001454545 0.0000000000 0.0000000000 0.0000000000 0.0006363636 1.818182e-04 0.000000e+00 
GF.ITS.VL46 GF.ITS.VL47  GF.ITS.VL48  GF.ITS.VL49  GF.ITS.VL50  GF.ITS.VL51 GF.ITS.VL52 
g__Kondoa            0.0000000000           0 0.0000000000 0.0000000000 0.0000000000 0.0000000000           0 
g__Sarocladium       0.0000000000           0 0.0000000000 0.0000000000 0.0000000000 0.0000000000           0 
g__Symmetrospora_1   0.0000000000           0 0.0000000000 0.0000000000 0.0000000000 0.0000000000           0 
o__Capnodiales       0.0007272727           0 0.0000000000 0.0003636364 0.0006363636 0.0000000000           0 
s__Candida_cretensis 0.0000000000           0 0.0000000000 0.0000000000 0.0000000000 0.0000000000           0 
c__Leotiomycetes     0.0000000000           0 0.0007272727 0.0000000000 0.0000000000 0.0001818182           0 
GF.ITS.VL53 GF.ITS.VL54 GF.ITS.VL56 GF.ITS.VL57  GF.ITS.VL58  GF.ITS.VL59 GF.ITS.VL60 
g__Kondoa                      0           0 0.000000000           0 9.090909e-05 0.0000000000           0 
g__Sarocladium                 0           0 0.000000000           0 0.000000e+00 0.0000000000           0 
g__Symmetrospora_1             0           0 0.000000000           0 0.000000e+00 0.0000000000           0 
o__Capnodiales                 0           0 0.001727273           0 3.636364e-04 0.0000000000           0 
s__Candida_cretensis           0           0 0.000000000           0 3.818182e-03 0.0002727273           0 
c__Leotiomycetes               0           0 0.000000000           0 0.000000e+00 0.0000000000           0

I hope someone out there can help me. Thanks!

The 'adjusted' values are often the same because of how the Benjamini-Hochberg method works.

The Benjamini-Hochberg adjusted p-values are calculated from the unadjusted p-values as follows:

  1. First order the p-values from smallest to largest.
  2. Multiply each by N/i , where N is the total number and i is the rank order
  3. Set any values bigger than one to be equal to 1
  4. Finally make sure the sequence always increases, by setting the adjusted p-value equal the subsequent value if the subsequent value is lower.

So your p-values are:

ps = c(0.059, 0.266, 0.346, 0.347, 0.068, 0.187, 0.072, 0.288, 0.113, 0.594)

if we sort them:

> p2 <- ps[order(ps)]
> p2
 [1] 0.059 0.068 0.072 0.113 0.187 0.266 0.288 0.346 0.347 0.594

now make the adjustment:

> pa <- p2 * 10/(1:10)
> pa
 [1] 0.5900000 0.3400000 0.2400000 0.2825000 0.3740000 0.4433333 0.4114286 0.4325000 0.3855556 0.5940000

You'll see that the 3rd value is 0.24, so we set the first and the second to 0.24 as well (because they were higher). The ninth value is 0.3856, so we set the 6th to 8th to 0.3856 as well. Then the final sequence will be the BH corrected values:

> p.adjust(p2[order(p2)], method="BH")
 [1] 0.2400000 0.2400000 0.2400000 0.2825000 0.3740000 0.3855556 0.3855556 0.3855556 0.3855556 0.5940000

And this is what you see in your table.

The wisdom of using a BH correction for post-hoc ANOVA comparisons is another question altogether!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM