简体   繁体   English

在 dplyr 中的 group_by 之后应用 p.adjust 函数

[英]apply p.adjust function after group_by in dplyr

My data dat is like this我的数据dat是这样的

set.seed(123)        
dat<- data.frame(
                comp = rep(1:4,2),
                grp = rep(c('A','B'), each=4),
                pval = runif(8, min=0, max=0.1) )
dat$pval[sample(nrow(dat), 1)] <- NA

pval column contains a list of p values from multiple ttest within each large group. pval 列包含来自每个大组内多个 ttest 的 p 值列表。
Now I need to apply the base r function p.adjust to adjust the p values within each group (A,B,...) what I did was:现在我需要应用基础 r 函数 p.adjust 来调整每个组(A,B,...)中的 p 值,我所做的是:

dat %>%
    group_by(grp) %>% 
    mutate(pval.adj = p.adjust (pval, method='BH'))

Below is the output of the above code:下面是上面代码的输出:

comp grp  pval       pval.adj
1   A   0.02875775  0.08179538  
2   A   0.07883051  0.08830174  
3   A   0.04089769  0.08179538  
4   A   0.08830174  0.08830174  
1   B   NA  NA  
2   B   0.00455565  0.01366695  
3   B   0.05281055  0.07921582  
4   B   0.08924190  0.08924190  

The result does not make sense.结果没有意义。 The last entry of each group, pval and pval.adj are equal.每组的最后一个条目,pval 和 pval.adj 相等。 Some pval.adj are much closer to pval than others.一些 pval.adj 比其他的更接近 pval。 I think something is wrong with applying the p.adjust function after group_by.我认为在 group_by 之后应用 p.adjust 函数有问题。 It took me hours but could not figure out why... I appreciate if someone could help me with that.我花了几个小时,但不知道为什么......如果有人可以帮助我,我很感激。

below is the p.adjust function usage:下面是 p.adjust 函数的用法:

p.adjust(p, method = p.adjust.methods, n = length(p))
p.adjust.methods
# c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
#   "fdr", "none")

@zesla your code is fine. @zesla 你的代码很好。 Your confusion lies in the difference between family-wise error rate and false discovery rate which is what BH does.您的困惑在于family-wise error rate和 BH 所做的false discovery rate之间的差异。 With BH you are much more likely to see those equal values.使用 BH,您更有可能看到这些相等的值。

If you look at the doco for p.adjust and run the sample code:如果您查看p.adjust的 doco 并运行示例代码:

set.seed(123)
x <- rnorm(50, mean = c(rep(0, 25), rep(3, 25)))
p <- 2*pnorm(sort(-abs(x)))

round(p, 3)
round(p.adjust(p), 3)
round(p.adjust(p, "BH"), 3)

you'll see the same effect.你会看到同样的效果。 You can also run a traditional family-wise error rate adjustment like Holm to see the effect on your own data...你也可以像 Holm 一样运行传统的family-wise error rate调整来查看对你自己数据的影响......

dat %>%
  group_by(grp) %>% 
  mutate(pval.adj = p.adjust (pval, method='holm'))

See also this article on how BH is calculated另请参阅有关如何计算 BH 的文章

you are calling p.adjust on each P value independently.您正在独立地对每个 P 值调用 p.adjust 。 I usually call p.adjust outside of the pipe for this reason.出于这个原因,我通常在管道外调用 p.adjust 。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM