简体   繁体   中英

boxplot using ggplot with n > 5

I am sure this question has been asked before. but I was unable to find anything similiar. So consider a simple worked example

We create random data and then create boxplots:

set.seed(123456)
Ax <- sample(1:3, size = 75, replace = T)
Fac <- sample(LETTERS[1:4], 75, replace = T)
yvalue <- runif(75)

df1 <- data.frame(Ax, Fac, yvalue)

library(ggplot2)
ggplot(df1, aes(factor(Ax), yvalue, colour = Fac)) + 
  geom_boxplot()

But we review our data closer:

table(df1$Ax, df1$Fac)

I want to create a boxplot plot like the one above, but when the group sizes (n=) is less than 6, then either:

  • Do not draw the boxplot at all
  • OR only draw a vertical line at the median

That is for the following data shaded in the red circles 在此输入图像描述

You can try:

include column of occurence using ave()

df1$length <- ave(df1$yvalue, interaction(df1$Ax, df1$Fac), FUN=length)

Now for instance adjust the alpha to plot uncoloured/shaded boxes:

ggplot(df1, aes(factor(Ax), yvalue, fill = Fac, alpha=factor(ifelse(df1$length < 6 ,0.5, 1)))) + 
geom_boxplot()

在此输入图像描述

If you don't care about have placeholder spaces for where the boxplots used to be you can simply just remove the observations that don't meet your criteria. The example below makes use of dplyr for the data manipulation

library(dplyr)
library(ggplot2)

### Identify all groups that have > 5 observations per group
df2 <- df1 %>%  group_by(Fac , Ax) %>%  summarise( n = n()) %>%  filter ( n > 5)

### Only keep groups that meet our criteria 
df3 <- df1 %>%  semi_join(df2 , by = c("Fac" , "Ax") )

ggplot(df3, aes(factor(Ax), yvalue, colour = Fac)) + 
  geom_boxplot()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM