简体   繁体   中英

force boxplots from geom_boxplot to constant width

I'm making a boxplot in which x and fill are mapped to different variables, a bit like this:

ggplot(mpg, aes(x=as.factor(cyl), y=cty, fill=as.factor(drv))) + 
    geom_boxplot()

在此输入图像描述

As in the example above, the widths of my boxes come out differently at different x values, because I do not have all possible combinations of x and fill values, so .

I would like for all the boxes to be the same width. Can this be done (ideally without manipulating the underlying data frame, because I fear that adding fake data will cause me confusion during further analysis)?

My first thought was

+ geom_boxplot(width=0.5)

but this doesn't help; it adjusts the width of the full set of boxplots for a given x factor level.

This post almost seems relevant, but I don't quite see how to apply it to my situation. Using + scale_fill_discrete(drop=FALSE) doesn't seem to change the widths of the bars.

The problem is due to some cells of factor combinations being not present. The number of data points for all combinations of the levels of cyl and drv can be checked via xtabs :

tab <- xtabs( ~ drv + cyl, mpg)

tab

#    cyl
# drv  4  5  6  8
#   4 23  0 32 48
#   f 58  4 43  1
#   r  0  0  4 21

There are three empty cells. I will add fake data to override the visualization problems.

Check the range of the dependent variable (y-axis). The fake data needs to be out of this range.

range(mpg$cty)
# [1]  9 35

Create a subset of mpg with the data needed for the plot:

tmp <- mpg[c("cyl", "drv", "cty")]

Create an index for the empty cells:

idx <- which(tab == 0, arr.ind = TRUE)

idx

#   row col
# r   3   1
# 4   1   2
# r   3   2

Create three fake lines (with -1 as value for cty ):

fakeLines <- apply(idx, 1,
                   function(x) 
                     setNames(data.frame(as.integer(dimnames(tab)[[2]][x[2]]), 
                                         dimnames(tab)[[1]][x[1]], 
                                         -1), 
                              names(tmp)))

fakeLines

# $r
#   cyl drv cty
# 1   4   r  -1
# 
# $`4`
#   cyl drv cty
# 1   5   4  -1
# 
# $r
#   cyl drv cty
# 1   5   r  -1

Add the rows to the existing data:

tmp2 <- rbind(tmp, do.call(rbind, fakeLines))

Plot:

library(ggplot2)
ggplot(tmp2, aes(x = as.factor(cyl), y = cty, fill = as.factor(drv))) + 
  geom_boxplot() +
  coord_cartesian(ylim = c(min(tmp$cty - 3), max(tmp$cty) + 3))
  # The axis limits have to be changed to suppress displaying the fake data.

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM