简体   繁体   English

R:在箱线图ggplot上显示均值和中位数标签

[英]R: Displaying mean and median labels on boxplot ggplot

I've just started working with R and trying to find out how to add mean and median labels on a box plot using ggplot. 我刚刚开始使用R,并尝试找出如何使用ggplot在箱形图上添加均值和中值标签。
I have a dataset: Unit, Quarter, # of Days: 我有一个数据集:单位,季度,天数:

dset <- read.table(text='Unit     Quarter  Days   Z  
HH       1Q      25  Y      
PA       1Q      28  N     
PA       1Q      10  Y     
HH       1Q      53  Y
HH       1Q      12  Y
HH       1Q      20  Y
HH       1Q      43  N
PA       1Q      11  Y
PA       1Q      66  Y
PA       1Q      54  Y      
PA       2Q      19  N
PA       2Q      46  Y
PA       2Q      37  Y
HH       2Q      22  Y      
HH       2Q      67  Y      
PA       2Q      45  Y
HH       2Q      48  Y
HH       2Q      15  N
PA       3Q      12  Y               
PA       3Q      53  Y      
HH       3Q      58  Y
HH       3Q      41  N
HH       3Q      18  Y
PA       3Q      26  Y
PA       3Q      12  Y
HH       3Q      63  Y
                   ', header=TRUE)

I need to show data by Unit and Quarter and create a boxplot displaying mean and median values. 我需要按单位和季度显示数据,并创建一个显示均值和中值的箱线图。
My code for a boxplot: 我的箱线图代码:

ggplot(data = dset, aes(x = Quarter
                       ,y = Days, fill = Quarter))  +
  geom_boxplot(outlier.shape = NA) + 
  facet_grid(. ~ Unit) + # adding another dimension
  coord_cartesian(ylim = c(10, 60)) + #sets the y-axis limits
  stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red", fill="red") + #adds average dot
  geom_text(data = means, aes(label = round(Days, 1), y = Days + 1), size = 3) + #adds average labels
  geom_text(data = medians, aes(label = round(Days, 1), y = Days - 0.5), size = 3) + #adds median labels
  xlab(" ") +
  ylab("Days") +
  ggtitle("Days") +
  theme(legend.position = 'none')

I can use geom_text function to add mean and median labels but only for one dimension ("Quarter") and it requires calculation of mean and median variables beforehand: 我可以使用geom_text函数添加均值和中位数标签,但只能添加一个维度(“四分之一”),它需要事先计算均值和中位数变量:

means <- aggregate(Days ~  Quarter, dset, mean)
medians <- aggregate(Days ~  Quarter, dset, median)

It works pretty good and I managed to calculate mean and median values by both "Unit" and "Quarter": 它工作得很好,我设法通过“单位”和“季度”来计算均值和中值:

means <- aggregate(dset[, 'Days'], list('Unit' = dset$Unit, 'Quarter' = dset$Quarter), mean)
medians <- aggregate(dset[, 'Days'], list('Unit' = dset$Unit, 'Quarter' = dset$Quarter), median)

but I do not know how to pass those variables to geom_text function to display lables for the mean and median. 但我不知道如何将这些变量传递给geom_text函数以显示均值和中位数的标签。 Maybe I should calculate mean and median in a different way or there are other options how to add those labels. 也许我应该以不同的方式计算均值和中位数,或者还有其他选择如何添加这些标签。
Would be grateful for any suggestions! 如有任何建议,将不胜感激!

Looks like the problem is that when you calculate the mean and median values by both "Unit" and "Quarter" the variable the used to be called "Days" is in now called "x". 看起来问题在于,当您同时通过“单位”和“季度”计算平均值和中位数时,以前称为“天”的变量现在称为“ x”。 So simply update your geom_text commands to reflect this. 因此,只需更新您的geom_text命令即可反映这一点。

ggplot(data = dset, aes(x = Quarter, y = Days, fill = Quarter))  +
  geom_boxplot(outlier.shape = NA) + 
  facet_grid(. ~ Unit) + # adding another dimension
  coord_cartesian(ylim = c(10, 60)) + #sets the y-axis limits
  stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red", fill="red") + #adds average dot
  geom_text(data = means, aes(label = round(x, 1), y = x + 1), size = 3) + #adds average labels
  geom_text(data = medians, aes(label = round(x, 1), y = x - 0.5), size = 3) + #adds median labels
  xlab(" ") +
  ylab("Days") +
  ggtitle("Days") +
  theme(legend.position = 'none')

In answer to your second question, I think you are looking for something like this. 在回答第二个问题时,我认为您正在寻找类似的东西。 This code produces the same chart but restricting to the subsample Z = Y . 此代码产生相同的图表,但仅限于子样本Z = Y

means <- aggregate(dset[, 'Days'][dset$Z=="Y"], list('Unit' = dset$Unit[dset$Z=="Y"], 'Quarter' = dset$Quarter[dset$Z=="Y"]), mean)
    medians <- aggregate(dset[, 'Days'][dset$Z=="Y"], list('Unit' = dset$Unit[dset$Z=="Y"], 'Quarter' = dset$Quarter[dset$Z=="Y"]), median)

ggplot(data = dset[dset$Z=="Y",], aes(x = Quarter, y = Days, fill = Quarter))  +
  geom_boxplot(outlier.shape = NA) + 
  facet_grid(. ~ Unit) + # adding another dimension
  coord_cartesian(ylim = c(10, 60)) + #sets the y-axis limits
  stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red", fill="red") + #adds average dot
  geom_text(data = means, aes(label = round(x, 1), y = x + 1), size = 3) + #adds average labels
  geom_text(data = medians, aes(label = round(x, 1), y = x - 0.5), size = 3) + #adds median labels
  xlab(" ") +
  ylab("Days") +
  ggtitle("Days") +
  theme(legend.position = 'none')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM