简体   繁体   English

用ggplot2拆分小提琴图

[英]Split violin plot with ggplot2

I'd like to create a split violin density plot using ggplot, like the fourth example on this page of the seaborn documentation.我想使用 ggplot 创建一个分离的小提琴密度图,就像 seaborn 文档这一页上的第四个例子一样。

在此处输入图像描述

Here is some data:这是一些数据:

set.seed(20160229)

my_data = data.frame(
    y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)),
    x=c(rep('a', 2000), rep('b', 2000)),
    m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000))
)

I can plot dodged violins like this:我可以像这样绘制闪避的小提琴:

library('ggplot2')

ggplot(my_data, aes(x, y, fill=m)) +
  geom_violin()

在此处输入图像描述

But it's hard to visually compare the widths at different points in the side-by-side distributions.但是很难直观地比较并排分布中不同点的宽度。 I haven't been able to find any examples of split violins in ggplot - is it possible?我没能在 ggplot 中找到任何分离小提琴的例子——这可能吗?

I found a base R graphics solution but the function is quite long and I want to highlight distribution modes, which are easy to add as additional layers in ggplot but will be harder to do if I need to figure out how to edit that function.我找到了一个基本的 R 图形解决方案,但函数很长,我想突出显示分布模式,这些模式很容易添加为 ggplot 中的附加层,但如果我需要弄清楚如何编辑该函数,将很难做到。

Or, to avoid fiddling with the densities, you could extend ggplot2 's GeomViolin like this: 或者,为了避免摆弄密度,您可以像这样扩展ggplot2的GeomViolin

GeomSplitViolin <- ggproto("GeomSplitViolin", GeomViolin, 
                           draw_group = function(self, data, ..., draw_quantiles = NULL) {
  data <- transform(data, xminv = x - violinwidth * (x - xmin), xmaxv = x + violinwidth * (xmax - x))
  grp <- data[1, "group"]
  newdata <- plyr::arrange(transform(data, x = if (grp %% 2 == 1) xminv else xmaxv), if (grp %% 2 == 1) y else -y)
  newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ])
  newdata[c(1, nrow(newdata) - 1, nrow(newdata)), "x"] <- round(newdata[1, "x"])

  if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
    stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <=
      1))
    quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
    aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE]
    aesthetics$alpha <- rep(1, nrow(quantiles))
    both <- cbind(quantiles, aesthetics)
    quantile_grob <- GeomPath$draw_panel(both, ...)
    ggplot2:::ggname("geom_split_violin", grid::grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob))
  }
  else {
    ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...))
  }
})

geom_split_violin <- function(mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., 
                              draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, 
                              show.legend = NA, inherit.aes = TRUE) {
  layer(data = data, mapping = mapping, stat = stat, geom = GeomSplitViolin, 
        position = position, show.legend = show.legend, inherit.aes = inherit.aes, 
        params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, na.rm = na.rm, ...))
}

And use the new geom_split_violin like this: 像这样使用新的geom_split_violin

ggplot(my_data, aes(x, y, fill = m)) + geom_split_violin()

在此处输入图片说明

Note : I think the answer by jan-glx is much better, and most people should use that instead. 注意 :我认为jan-glx的答案要好得多,大多数人应该改用它。


You can achieve this by calculating the densities yourself beforehand, and then plotting polygons. 您可以通过自己先计算密度,然后绘制多边形来实现。 See below for a rough idea. 大致请见下文。

Get densities 获取密度

library(dplyr)
pdat <- my_data %>%
  group_by(x, m) %>%
  do(data.frame(loc = density(.$y)$x,
                dens = density(.$y)$y))

Flip and offset densities for the groups 组的翻转和偏移密度

pdat$dens <- ifelse(pdat$m == 'i', pdat$dens * -1, pdat$dens)
pdat$dens <- ifelse(pdat$x == 'b', pdat$dens + 1, pdat$dens)

Plot 情节

ggplot(pdat, aes(dens, loc, fill = m, group = interaction(m, x))) + 
  geom_polygon() +
  scale_x_continuous(breaks = 0:1, labels = c('a', 'b')) +
  ylab('density') +
  theme_minimal() +
  theme(axis.title.x = element_blank())

Result 结果

在此处输入图片说明

It is now possible to do this with the introdataviz package using the geom_split_violin function, which makes it really easy to create these plots.现在可以通过使用geom_split_violin function 的introdataviz package 来做到这一点,这使得创建这些图变得非常容易。 Here is a reproducible example:这是一个可重现的示例:

set.seed(20160229)
my_data = data.frame(
  y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)),
  x=c(rep('a', 2000), rep('b', 2000)),
  m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000))
)

library(ggplot2)
# devtools::install_github("psyteachr/introdataviz")
library(introdataviz)

ggplot(my_data, aes(x = x, y = y, fill = m)) +
  geom_split_violin()

Created on 2022-08-24 with reprex v2.0.2使用reprex v2.0.2创建于 2022-08-24

As you can see, it creates a split violin plot.如您所见,它创建了一个分裂小提琴 plot。 If you want more information and a tutorial of this package, check the link above.如果您想了解更多信息和此 package 的教程,请查看上面的链接。

@jan-jlx's solution is wonderful. @jan-jlx 的解决方案很棒。 For densities with thin tails, I'd like to insert a little space between the two halves of the violin so the tails are easier to tell apart.对于尾巴较细的密度,我想在小提琴的两半之间插入一点空间,以便更容易区分尾巴。 Here's a slight modification of @jan-jlx's code to do this, borrowing the nudge parameter from the gghalves package:这是对@jan-jlx 的代码的轻微修改,借用了 gghalves 包中的 nudge 参数:

GeomSplitViolin <- ggplot2::ggproto(
    "GeomSplitViolin",
    ggplot2::GeomViolin,
    draw_group = function(self,
                          data,
                          ...,
                          nudge = 0,
                          draw_quantiles = NULL) {
        data <- transform(data,
                          xminv = x - violinwidth * (x - xmin),
                          xmaxv = x + violinwidth * (xmax - x))
        grp <- data[1, "group"]
        newdata <- plyr::arrange(transform(data,
                                           x = if (grp %% 2 == 1) xminv else xmaxv),
                                 if (grp %% 2 == 1) y else -y)
        newdata <- rbind(newdata[1, ],
                         newdata,
                         newdata[nrow(newdata), ],
                         newdata[1, ])
        newdata[c(1, nrow(newdata)-1, nrow(newdata)), "x"] <- round(newdata[1, "x"])

        newdata$x <- ifelse(newdata$group %% 2 == 1,
                            newdata$x - nudge,
                            newdata$x + nudge)

        if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {

            stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))

            quantiles <- ggplot2:::create_quantile_segment_frame(data,
                                                             draw_quantiles)
            aesthetics <- data[rep(1, nrow(quantiles)),
                               setdiff(names(data), c("x", "y")),
                               drop = FALSE]
            aesthetics$alpha <- rep(1, nrow(quantiles))
            both <- cbind(quantiles, aesthetics)
            quantile_grob <- ggplot2::GeomPath$draw_panel(both, ...)
            ggplot2:::ggname("geom_split_violin",
                             grid::grobTree(ggplot2::GeomPolygon$draw_panel(newdata, ...),
                                            quantile_grob))
        }
    else {
            ggplot2:::ggname("geom_split_violin",
                             ggplot2::GeomPolygon$draw_panel(newdata, ...))
        }
    }
)

    geom_split_violin <- function(mapping = NULL,
                                  data = NULL,
                                  stat = "ydensity",
                                  position = "identity",
                                  nudge = 0,
                                  ...,
                                  draw_quantiles = NULL,
                                  trim = TRUE,
                                  scale = "area",
                                  na.rm = FALSE,
                                  show.legend = NA,
                                  inherit.aes = TRUE) {

        ggplot2::layer(data = data,
                       mapping = mapping,
                       stat = stat,
                       geom = GeomSplitViolin,
                       position = position,
                       show.legend = show.legend,
                       inherit.aes = inherit.aes,
                       params = list(trim = trim,
                                     scale = scale,
                                     nudge = nudge,
                                     draw_quantiles = draw_quantiles,
                                     na.rm = na.rm,
                                     ...))
}

Here's a plot I get with geom_split_violin(nudge = 0.02) .这是我用geom_split_violin(nudge = 0.02)得到的图。

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM