简体   繁体   中英

How to plot boxplots superimposed with sorted points using ggplot2

Using ggplot2, I can plot a boxplot superimposed with points. But the points are located on a vertical line.

library(ggplot2)

example_data <- data.frame(cohort = c("ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC"), 
                           sample = c("A5LI", "A5JQ", "A5JP", "A5LE", "A5LG", "A5JV", "A5JD", "A5J8", "A5K8", "A5L3", "AA33", "AA30", "AA2T", "A95A", "AAZT", "A8I3", "AAV9", "A8Y4", "A8Y8", "AA31", "AAAT", "A9U4", "A7Q1", "A7DS", "A9TV", "A4D5", "A9TY", "A7CX", "A9TW", "A86F"), 
                           count = c(50, 5, 65, 22, 18, 25, 27, 86, 24, 20, 48, 96, 60, 27, 81, 34, 43, 58, 31, 77, 160, 31, 157, 104, 84, 53, 153, 111, 278, 105))


ggplot(example_data, aes(cohort, count)) + 
  geom_boxplot(aes(color = cohort)) + 
  geom_point(aes(color = cohort)) +
  scale_y_log10() +
  labs(x = NULL) +
  theme(axis.line.x = element_blank(), axis.ticks.x = element_blank(), 
        axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 0.5), legend.position = 'none')

How could I reorder the points according their y values ("count" size in example_data) like this plot?

阴谋

If you look at the example plot you showed of your desired output and consider the scales, there are basically two different layers:

  1. Overall: The x axis as some category ("DKFZ", "Sanger", "SMuFin"...) and the y axis being some value used for the boxplot.

  2. Within each boxplot: the x axis is some other continuous value and the y axis being the same value used as the y axis in the boxplot.

This means that the x axis for each boxplot is different than the x axis used for the plot as a whole. You kind of want a "secondary x axis". All comments on if this is a good idea aside, I can show you the approach for how one might do this in ggplot2 .

A secondary x axis is not a built-in feature with ggplot2 ; however, since one of your desired axes is categorical/discrete ( example_data$cohort ) and the other axis is continuous ( example_data$count ), we can simulate this effect of two x axes with some clever formatting of facets.

The general idea is that we separate your plot into facets based on cohort , then within each plot we show a boxplot for the whole (grouped by cohort ) and plot points on each facet. This means our x axis value is count as well as the y axis value - I assume that in your real data the axis values would not be the same, but it works for example purposes. Then, we can use some theme elements and options regarding the facet labels (referred to as strip.text elements in ggplot2 ) to simulate the same look. I'm also switching to use the theme_classic() by default, since otherwise you have to deal with the x gridlines that won't make sense in the final plot. If you want the vertical lines, you'll have to place them manually or programmatically based on your data.

Normally, facets are spaced apart, but I'm pushing them together via panel.spacing.x .

It's useful to compare the plots side-by-side, so note that I'm using cowplot::plot_grid() to arrange the old and new plots for demonstration purposes here.

One very important note is that I'm adding outlier.shape = NA to the call for geom_boxplot() . This is important because by default any outliers will be shown via the geom_boxplot() command as points, and they would be in the "incorrect" x position. Since we're already handling the desired position for all these points, it's necessary to remove them like this.

p <- # your code you shared + labs(title="Old Plot")

p1 <- 
ggplot(example_data, aes(count, count)) +
  geom_boxplot(aes(color=cohort), outlier.shape = NA) +
  geom_point(aes(color=cohort)) +
  facet_wrap(~cohort, scales='free_x', strip.position = 'bottom') +
  scale_y_log10() +
  labs(title='New Plot', x=NULL) +
  theme_classic() +
  theme(
    panel.spacing.x = unit(0,'pt'),
    axis.text.x = element_blank(),
    strip.placement = 'outside',
    strip.background = element_blank(),
    axis.ticks.x = element_blank()
  )

library(cowplot)
plot_grid(p, p1)

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM