简体   繁体   English

如何使用 ggplot2 叠加排序点的 plot 箱线图

[英]How to plot boxplots superimposed with sorted points using ggplot2

Using ggplot2, I can plot a boxplot superimposed with points.使用 ggplot2,我可以 plot 用点叠加箱线图。 But the points are located on a vertical line.但是这些点位于一条垂直线上。

library(ggplot2)

example_data <- data.frame(cohort = c("ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "CHOL", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC", "DLBC"), 
                           sample = c("A5LI", "A5JQ", "A5JP", "A5LE", "A5LG", "A5JV", "A5JD", "A5J8", "A5K8", "A5L3", "AA33", "AA30", "AA2T", "A95A", "AAZT", "A8I3", "AAV9", "A8Y4", "A8Y8", "AA31", "AAAT", "A9U4", "A7Q1", "A7DS", "A9TV", "A4D5", "A9TY", "A7CX", "A9TW", "A86F"), 
                           count = c(50, 5, 65, 22, 18, 25, 27, 86, 24, 20, 48, 96, 60, 27, 81, 34, 43, 58, 31, 77, 160, 31, 157, 104, 84, 53, 153, 111, 278, 105))


ggplot(example_data, aes(cohort, count)) + 
  geom_boxplot(aes(color = cohort)) + 
  geom_point(aes(color = cohort)) +
  scale_y_log10() +
  labs(x = NULL) +
  theme(axis.line.x = element_blank(), axis.ticks.x = element_blank(), 
        axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 0.5), legend.position = 'none')

How could I reorder the points according their y values ("count" size in example_data) like this plot?我怎样才能像 plot 那样根据它们的 y 值(example_data 中的“计数”大小)对点重新排序?

阴谋

If you look at the example plot you showed of your desired output and consider the scales, there are basically two different layers:如果您查看示例 plot 您展示了您想要的 output 并考虑尺度,基本上有两个不同的层:

  1. Overall: The x axis as some category ("DKFZ", "Sanger", "SMuFin"...) and the y axis being some value used for the boxplot.总体:x 轴作为某个类别(“DKFZ”、“Sanger”、“SMuFin”...),y 轴是用于箱线图的某个值。

  2. Within each boxplot: the x axis is some other continuous value and the y axis being the same value used as the y axis in the boxplot.在每个箱线图中:x 轴是其他一些连续值,y 轴与箱线图中的 y 轴相同。

This means that the x axis for each boxplot is different than the x axis used for the plot as a whole.这意味着每个箱线图的 x 轴与整体上用于 plot 的 x 轴不同。 You kind of want a "secondary x axis".你有点想要一个“辅助x轴”。 All comments on if this is a good idea aside, I can show you the approach for how one might do this in ggplot2 .所有关于这是否是个好主意的评论,我可以在ggplot2中向您展示如何做到这一点的方法。

A secondary x axis is not a built-in feature with ggplot2 ;辅助 x 轴不是ggplot2的内置功能; however, since one of your desired axes is categorical/discrete ( example_data$cohort ) and the other axis is continuous ( example_data$count ), we can simulate this effect of two x axes with some clever formatting of facets.但是,由于您想要的轴之一是分类/离散的( example_data$cohort )而另一个轴是连续的( example_data$count ),我们可以通过一些巧妙的构面格式来模拟两个 x 轴的这种效果。

The general idea is that we separate your plot into facets based on cohort , then within each plot we show a boxplot for the whole (grouped by cohort ) and plot points on each facet.总体思路是,我们根据cohort将您的 plot 分成多个方面,然后在每个 plot 中,我们显示整个箱线图(按cohort分组)和 Z32FA6E1B78A9D4028953E60564A 上的每个方面的箱线图。 This means our x axis value is count as well as the y axis value - I assume that in your real data the axis values would not be the same, but it works for example purposes.这意味着我们的 x 轴值和 y 轴值一样count - 我假设在您的真实数据中轴值不会相同,但它适用于示例目的。 Then, we can use some theme elements and options regarding the facet labels (referred to as strip.text elements in ggplot2 ) to simulate the same look.然后,我们可以使用一些关于构面标签的theme元素和选项(在ggplot2中称为strip.text元素)来模拟相同的外观。 I'm also switching to use the theme_classic() by default, since otherwise you have to deal with the x gridlines that won't make sense in the final plot.我也切换到默认使用theme_classic() ,否则你必须处理在最终plot中没有意义的x网格线。 If you want the vertical lines, you'll have to place them manually or programmatically based on your data.如果您想要垂直线,则必须根据您的数据手动或以编程方式放置它们。

Normally, facets are spaced apart, but I'm pushing them together via panel.spacing.x .通常,面是分开的,但我通过panel.spacing.x将它们推到一起。

It's useful to compare the plots side-by-side, so note that I'm using cowplot::plot_grid() to arrange the old and new plots for demonstration purposes here.并排比较这些图很有用,因此请注意,我在这里使用cowplot::plot_grid()来排列旧图和新图以进行演示。

One very important note is that I'm adding outlier.shape = NA to the call for geom_boxplot() .一个非常重要的注意事项是我将outlier.shape = NA添加到geom_boxplot()的调用中。 This is important because by default any outliers will be shown via the geom_boxplot() command as points, and they would be in the "incorrect" x position.这很重要,因为默认情况下,任何异常值都将通过geom_boxplot()命令显示为点,并且它们将位于“不正确的”x position 中。 Since we're already handling the desired position for all these points, it's necessary to remove them like this.由于我们已经为所有这些点处理了所需的 position,因此有必要像这样删除它们。

p <- # your code you shared + labs(title="Old Plot")

p1 <- 
ggplot(example_data, aes(count, count)) +
  geom_boxplot(aes(color=cohort), outlier.shape = NA) +
  geom_point(aes(color=cohort)) +
  facet_wrap(~cohort, scales='free_x', strip.position = 'bottom') +
  scale_y_log10() +
  labs(title='New Plot', x=NULL) +
  theme_classic() +
  theme(
    panel.spacing.x = unit(0,'pt'),
    axis.text.x = element_blank(),
    strip.placement = 'outside',
    strip.background = element_blank(),
    axis.ticks.x = element_blank()
  )

library(cowplot)
plot_grid(p, p1)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM