[英]apply p.adjust function after group_by in dplyr
My data dat
is like this我的数据
dat
是这样的
set.seed(123)
dat<- data.frame(
comp = rep(1:4,2),
grp = rep(c('A','B'), each=4),
pval = runif(8, min=0, max=0.1) )
dat$pval[sample(nrow(dat), 1)] <- NA
pval column contains a list of p values from multiple ttest within each large group. pval 列包含来自每个大组内多个 ttest 的 p 值列表。
Now I need to apply the base r function p.adjust to adjust the p values within each group (A,B,...) what I did was:现在我需要应用基础 r 函数 p.adjust 来调整每个组(A,B,...)中的 p 值,我所做的是:
dat %>%
group_by(grp) %>%
mutate(pval.adj = p.adjust (pval, method='BH'))
Below is the output of the above code:下面是上面代码的输出:
comp grp pval pval.adj
1 A 0.02875775 0.08179538
2 A 0.07883051 0.08830174
3 A 0.04089769 0.08179538
4 A 0.08830174 0.08830174
1 B NA NA
2 B 0.00455565 0.01366695
3 B 0.05281055 0.07921582
4 B 0.08924190 0.08924190
The result does not make sense.结果没有意义。 The last entry of each group, pval and pval.adj are equal.
每组的最后一个条目,pval 和 pval.adj 相等。 Some pval.adj are much closer to pval than others.
一些 pval.adj 比其他的更接近 pval。 I think something is wrong with applying the p.adjust function after group_by.
我认为在 group_by 之后应用 p.adjust 函数有问题。 It took me hours but could not figure out why... I appreciate if someone could help me with that.
我花了几个小时,但不知道为什么......如果有人可以帮助我,我很感激。
below is the p.adjust function usage:下面是 p.adjust 函数的用法:
p.adjust(p, method = p.adjust.methods, n = length(p))
p.adjust.methods
# c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
# "fdr", "none")
@zesla your code is fine. @zesla 你的代码很好。 Your confusion lies in the difference between
family-wise error rate
and false discovery rate
which is what BH does.您的困惑在于
family-wise error rate
和 BH 所做的false discovery rate
之间的差异。 With BH you are much more likely to see those equal values.使用 BH,您更有可能看到这些相等的值。
If you look at the doco for p.adjust
and run the sample code:如果您查看
p.adjust
的 doco 并运行示例代码:
set.seed(123)
x <- rnorm(50, mean = c(rep(0, 25), rep(3, 25)))
p <- 2*pnorm(sort(-abs(x)))
round(p, 3)
round(p.adjust(p), 3)
round(p.adjust(p, "BH"), 3)
you'll see the same effect.你会看到同样的效果。 You can also run a traditional
family-wise error rate
adjustment like Holm to see the effect on your own data...你也可以像 Holm 一样运行传统的
family-wise error rate
调整来查看对你自己数据的影响......
dat %>%
group_by(grp) %>%
mutate(pval.adj = p.adjust (pval, method='holm'))
See also this article on how BH is calculated另请参阅有关如何计算 BH 的文章
you are calling p.adjust on each P value independently.您正在独立地对每个 P 值调用 p.adjust 。 I usually call p.adjust outside of the pipe for this reason.
出于这个原因,我通常在管道外调用 p.adjust 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.