简体   繁体   中英

How to add summary information to geom_boxplot

I have been using R/ggplot2 to successfully generate graphs that are very close to those generated by a commercial tool used in my company. But there are a few features that I am unable to implement to my boxplots generated using R.

  1. A summary "table" at the bottom of the graph showing items like, median, count, outliers.
  2. Display the strip text such that duplicate labels are removed.

Sample R code:

library(ggplot2)
library(data.table)
library(reshape2)
library(grid)

# create dataset
dt <- data.table(mpg)

# melt the data table
dtm <- data.table(
  melt(data=dt,
       id.vars=c("manufacturer","model","displ","year","cyl","trans","drv","fl","class"),
       variable.name="mode", value.name="mpg"))
write.csv(dtm,file="dtm.csv",row.names=F)

# draw some plots
p <- ggplot(dtm, aes(x=mode,y=mpg)) +
  geom_boxplot(aes(fill=mode), varwidth=F) +
  facet_grid( ~ manufacturer + year ) +
  theme_bw() +
  theme(panel.margin=unit(0,"mm"), panel.grid=element_blank()) +
  theme(axis.text.x=element_blank(), axis.title.x=element_blank()) +
  theme(legend.position="bottom") +
  coord_cartesian(ylim=c(0,50))
p

ggsave(plot=p, filename='ddtm_r.png', dpi=72, width=16, height=8)

The output generated by R is here,

R / ggplot2生成的箱线图

The same boxplot generated by the commercial tool is here,

商业工具生成的箱线图

As you can see the data in the graphs are idential (as expected), but the strip headers are better organized in the commercial version, and I can add the summary table at the bottom of the graph as well.

Is there any way in R to do something similar ?

Thanks & regards,

Derric

UPDATED: 7 June 2014

With the suggestions from colleagues and online help, I am able to now plot graphs that contain the boxplot with the summary table appended to the bottom of the boxplot. The main idea was to extract the panel information from the boxplot grob and then use this to generate the text table, and then redraw the plot with the two grobs. The legend of the original boxplot had to be removed in order to get correct right hand side alignment.

The modified R code is as follows,

# load the libraries
library(grid)
library(gridExtra)
library(data.table)
library(reshape2)
library(ggplot2)
library(gtable)
library(plyr)

# create dataset
dt <- data.table(mpg)

# melt the data table
dtm <- data.table(
  melt(data=dt,
        id.vars=c("manufacturer","model","displ","year","cyl","trans","drv","fl","class"),
       variable.name="mode", value.name="mpg"))
#write.csv(dtm,file="dtm.csv",row.names=F)

# draw some plots
p1 <- ggplot(dtm, aes(x=factor(year),y=mpg)) +
  geom_boxplot(aes(fill=factor(year)), varwidth=F) +
  facet_grid( ~ manufacturer + mode ) +
  theme_bw() +
  theme(panel.margin=unit(0,"lines"), 
        panel.grid=element_blank(),
        strip.text=element_text(angle=90),
        axis.text.x=element_blank(), 
        axis.title.x=element_blank(),
        axis.ticks.x=element_blank(),
        plot.margin=unit(c(0,0,0,0),"lines"),
        legend.position="right") +
  coord_cartesian(ylim=c(0,50)) +
  xlab(NULL) 

# deconstruct the plot p1
pb <- ggplot_build(p1)
# pb has three groups; data, panel and plot
pb.data <- pb$data
# pb.data[[1]] is a data.frame
pb.data.df <- pb.data[[1]]
# melt the pb.data.df
pb.data.dt <- data.table(pb.data.df)
#pb.data.dt[,':='(outliers=NULL)]
pb.data.dtm <- melt(data=pb.data.dt,
                    #id.vars=c("x","PANEL"),
                    measure.vars=c("middle","lower","upper"),
                    variable.name="mode",
                    value.name="value")

p2 <- ggplot(pb.data.dtm, aes(x=factor(x),y=factor(mode),label=format(value,nsmall=1))) +
  geom_text(size=3.0, angle=90, hjust=0.5) + facet_grid(~ PANEL) +
  theme_bw() +
  scale_y_discrete() +
  theme(panel.margin=unit(0,"lines"),
    panel.grid=element_blank(),
    panel.border=element_rect(), 
    legend.position="right",
    axis.text.x=element_blank(),
    axis.text.y=element_text(angle=0),
    axis.ticks=element_blank(),
    strip.text=element_blank(),
    strip.background=element_blank(),
    plot.margin=unit(c(0,0,0,0),"lines")
    ) +
  xlab(NULL) + ylab(NULL)

# a function to extract the legend from the grob
g_legend <- function(a.gplot) {
  tmp <- ggplotGrob(a.gplot)
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  legend
}
legend1 <- g_legend(p1)
pa <- arrangeGrob((p1 + theme(legend.position='none')), legend1,
             (p2 + theme(legend.position='none')), 
             ncol=2, nrow=2, 
             heights=c(50/60,10/60), widths=c(95/100,5/100))
ggsave(plot=pa, filename='dtm_r.png',dpi=72,height=10,width=18)

This script now gives me a graph that is shown below,

带有下面的汇总表的箱线图

The issue I have with this graph is that the panels are slightly mis-aligned on the left hand side. Any suggestions on how to align both the left hand side and right hand side of the panels ?

Thanks,

I think you should only use one facet level, and instead include the year in the grouping argument that is passed to 'fill':

 p <- ggplot(dtm, aes(x=mode,y=mpg)) +
  geom_boxplot(aes(fill=interaction(mode,year) ), varwidth=F) +
  facet_grid( ~ manufacturer ) +
  theme_bw() +
  theme(panel.margin=unit(0,"mm"), panel.grid=element_blank()) +
  theme(axis.text.x=element_blank(), axis.title.x=element_blank()) +
  theme(legend.position="bottom") +
  coord_cartesian(ylim=c(0,50))

 p

The divisions between brands is more distinct. Furthermore, you can unambiguously modify the theme arguments so the brand names are fully visible:

png(width=650) ; p <- p + theme(strip.text.x = element_text(size=8, angle=75)); 
print(p); dev.off()

在此处输入图片说明

If you compare these plots, I think it is clear that the one I created conveys information in a manner that is easier to appreciate. The comparison between 1999 and 2008 miles per gallons is much easier for the viewer to see within each 'manufacturer'. The color helps and the facet divisions are only at the "highest level" which allows proper within-group comparisons.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM