隔离具有显着 p 值的图

Question

For starters, the data comes from the us_contagious_diseases dataset, and the packages are tidyverse and ggpubr首先，数据来自us_contagious_diseases数据集，包是tidyverse和ggpubr

library(dslabs)
library(ggpubr)
library(tidyverse)
data("us_contagious_diseases")

I modified this dataset via the code below:我通过下面的代码修改了这个数据集：

sdf <- us_contagious_diseases %>% filter(., disease == 'Rubella' | disease == 'Mumps') %>% transmute(., disease, count, population, state)

Then I created a boxplot comparing the numbers of Rubella and Mumps cases in each State:然后我创建了一个箱线图，比较每个州的风疹和腮腺炎病例数：

sdf_plot <- ggplot(sdf, mapping = aes(x = disease, y = count)) + geom_boxplot(outlier.shape = NA) + facet_wrap('state', scales = 'free') + stat_compare_means(method = 't.test', label.y.npc = 0.8)

The thing is, there are FIFTY ONE plots in this figure!!!问题是，这个图中有五十一个情节！！！ That's wayyyy to huge to include in my report.包含在我的报告中是非常重要的。 More importantly, many of these comparisons don't have significant p-values.更重要的是，这些比较中的许多没有显着的 p 值。 Is there a way I can pull just those plots that have ap value less than 0.01?有没有办法可以只提取那些 ap 值小于 0.01 的图？

Answer 1

I guess you need to pre-calculate the p-values:我想您需要预先计算 p 值：

library(broom)
res = sdf %>% group_by(state) %>% do(tidy(t.test(count~disease,data=.)))
head(res)

# A tibble: 6 x 11
# Groups:   state [6]
  state estimate estimate1 estimate2 statistic p.value parameter conf.low
  <fct>    <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>
1 Alab…    125.       180.      55.4     2.46  0.0181       42.7     22.4
2 Alas…     66.1      104.      38.2     1.52  0.136        45.4    -21.7
3 Ariz…     78.6      266.     187.      0.657 0.513        68.0   -160. 
4 Arka…     84.3      113.      28.4     2.87  0.00628      45.5     25.1
5 Cali…    386.      1915.    1529.      0.540 0.592        59.3  -1046. 
6 Colo…     95.0      314.     219.      0.762 0.449        62.6   -154. 

keep = res$state[res$p.value<0.01]
[1] Arkansas             District Of Columbia Georgia             
[4] Kansas               Maryland             Nevada              
[7] Ohio

Then plot using this filter:然后使用此过滤器绘图：

sdf_plot <- ggplot(subset(sdf,state %in% keep),aes(x = disease, y = count)) + 
geom_boxplot(outlier.shape = NA) + 
facet_wrap('state', scales = 'free') + 
stat_compare_means(method = 't.test', label.y.npc = 0.8)

隔离具有显着 p 值的图

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-10-12 15:34:04

隔离具有显着 p 值的图

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-10-12 15:34:04

解决方案1
0 已采纳 2020-10-12 15:34:04