简体   繁体   中英

R: Displaying mean and median labels on boxplot ggplot

I've just started working with R and trying to find out how to add mean and median labels on a box plot using ggplot.
I have a dataset: Unit, Quarter, # of Days:

dset <- read.table(text='Unit     Quarter  Days   Z  
HH       1Q      25  Y      
PA       1Q      28  N     
PA       1Q      10  Y     
HH       1Q      53  Y
HH       1Q      12  Y
HH       1Q      20  Y
HH       1Q      43  N
PA       1Q      11  Y
PA       1Q      66  Y
PA       1Q      54  Y      
PA       2Q      19  N
PA       2Q      46  Y
PA       2Q      37  Y
HH       2Q      22  Y      
HH       2Q      67  Y      
PA       2Q      45  Y
HH       2Q      48  Y
HH       2Q      15  N
PA       3Q      12  Y               
PA       3Q      53  Y      
HH       3Q      58  Y
HH       3Q      41  N
HH       3Q      18  Y
PA       3Q      26  Y
PA       3Q      12  Y
HH       3Q      63  Y
                   ', header=TRUE)

I need to show data by Unit and Quarter and create a boxplot displaying mean and median values.
My code for a boxplot:

ggplot(data = dset, aes(x = Quarter
                       ,y = Days, fill = Quarter))  +
  geom_boxplot(outlier.shape = NA) + 
  facet_grid(. ~ Unit) + # adding another dimension
  coord_cartesian(ylim = c(10, 60)) + #sets the y-axis limits
  stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red", fill="red") + #adds average dot
  geom_text(data = means, aes(label = round(Days, 1), y = Days + 1), size = 3) + #adds average labels
  geom_text(data = medians, aes(label = round(Days, 1), y = Days - 0.5), size = 3) + #adds median labels
  xlab(" ") +
  ylab("Days") +
  ggtitle("Days") +
  theme(legend.position = 'none')

I can use geom_text function to add mean and median labels but only for one dimension ("Quarter") and it requires calculation of mean and median variables beforehand:

means <- aggregate(Days ~  Quarter, dset, mean)
medians <- aggregate(Days ~  Quarter, dset, median)

It works pretty good and I managed to calculate mean and median values by both "Unit" and "Quarter":

means <- aggregate(dset[, 'Days'], list('Unit' = dset$Unit, 'Quarter' = dset$Quarter), mean)
medians <- aggregate(dset[, 'Days'], list('Unit' = dset$Unit, 'Quarter' = dset$Quarter), median)

but I do not know how to pass those variables to geom_text function to display lables for the mean and median. Maybe I should calculate mean and median in a different way or there are other options how to add those labels.
Would be grateful for any suggestions!

Looks like the problem is that when you calculate the mean and median values by both "Unit" and "Quarter" the variable the used to be called "Days" is in now called "x". So simply update your geom_text commands to reflect this.

ggplot(data = dset, aes(x = Quarter, y = Days, fill = Quarter))  +
  geom_boxplot(outlier.shape = NA) + 
  facet_grid(. ~ Unit) + # adding another dimension
  coord_cartesian(ylim = c(10, 60)) + #sets the y-axis limits
  stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red", fill="red") + #adds average dot
  geom_text(data = means, aes(label = round(x, 1), y = x + 1), size = 3) + #adds average labels
  geom_text(data = medians, aes(label = round(x, 1), y = x - 0.5), size = 3) + #adds median labels
  xlab(" ") +
  ylab("Days") +
  ggtitle("Days") +
  theme(legend.position = 'none')

In answer to your second question, I think you are looking for something like this. This code produces the same chart but restricting to the subsample Z = Y .

means <- aggregate(dset[, 'Days'][dset$Z=="Y"], list('Unit' = dset$Unit[dset$Z=="Y"], 'Quarter' = dset$Quarter[dset$Z=="Y"]), mean)
    medians <- aggregate(dset[, 'Days'][dset$Z=="Y"], list('Unit' = dset$Unit[dset$Z=="Y"], 'Quarter' = dset$Quarter[dset$Z=="Y"]), median)

ggplot(data = dset[dset$Z=="Y",], aes(x = Quarter, y = Days, fill = Quarter))  +
  geom_boxplot(outlier.shape = NA) + 
  facet_grid(. ~ Unit) + # adding another dimension
  coord_cartesian(ylim = c(10, 60)) + #sets the y-axis limits
  stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red", fill="red") + #adds average dot
  geom_text(data = means, aes(label = round(x, 1), y = x + 1), size = 3) + #adds average labels
  geom_text(data = medians, aes(label = round(x, 1), y = x - 0.5), size = 3) + #adds median labels
  xlab(" ") +
  ylab("Days") +
  ggtitle("Days") +
  theme(legend.position = 'none')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM