简体   繁体   English

ggplot2:将样本大小信息添加到x轴刻度标签

[英]ggplot2: Adding sample size information to x-axis tick labels

This question is related to Create custom geom to compute summary statistics and display them *outside* the plotting region (NOTE: All functions have been simplified; no error checks for correct objects types, NAs, etc.) 此问题与创建自定义geom以计算汇总统计信息并在*绘图区域外显示* (注意:所有函数都已简化;没有错误检查正确的对象类型,NA等)

In base R, it is quite easy to create a function that produces a stripchart with the sample size indicated below each level of the grouping variable: you can add the sample size information using the mtext() function: 在基础R中,很容易创建一个生成条带图的函数,其中样本大小在分组变量的每个级别下面指示:您可以使用mtext()函数添加样本大小信息:

stripchart_w_n_ver1 <- function(data, x.var, y.var) {
    x <- factor(data[, x.var])
    y <- data[, y.var]
# Need to call plot.default() instead of plot because 
# plot() produces boxplots when x is a factor.
    plot.default(x, y, xaxt = "n",  xlab = x.var, ylab = y.var)
    levels.x <- levels(x)
    x.ticks <- 1:length(levels(x))
    axis(1, at = x.ticks, labels = levels.x)
    n <- sapply(split(y, x), length)
    mtext(paste0("N=", n), side = 1, line = 2, at = x.ticks)
}

stripchart_w_n_ver1(mtcars, "cyl", "mpg")

or you can add the sample size information to the x-axis tick labels using the axis() function: 或者您可以使用axis()函数将样本大小信息添加到x轴刻度标签:

stripchart_w_n_ver2 <- function(data, x.var, y.var) {
    x <- factor(data[, x.var])
    y <- data[, y.var]
# Need to set the second element of mgp to 1.5 
# to allow room for two lines for the x-axis tick labels.
    o.par <- par(mgp = c(3, 1.5, 0))
    on.exit(par(o.par))
# Need to call plot.default() instead of plot because 
# plot() produces boxplots when x is a factor.
    plot.default(x, y, xaxt = "n", xlab = x.var, ylab = y.var)
    n <- sapply(split(y, x), length)
    levels.x <- levels(x)
    axis(1, at = 1:length(levels.x), labels = paste0(levels.x, "\nN=", n))
}

stripchart_w_n_ver2(mtcars, "cyl", "mpg")

使用轴()的示例

While this is a very easy task in base R, it is maddingly complex in ggplot2 because it is very hard to get at the data being used to generate the plot, and while there are functions equivalent to axis() (eg, scale_x_discrete , etc.) there is no equivalent to mtext() that lets you easily place text at specified coordinates within the margins. 虽然这在基础R中是一项非常简单的任务,但它在ggplot2中非常复杂,因为很难获得用于生成绘图的数据,并且有相当于axis()函数(例如, scale_x_discrete等。)没有相当于mtext()可以让你轻松地将文本放在边距内的指定坐标上。

I tried using the built in stat_summary() function to compute the sample sizes (ie, fun.y = "length" ) and then place that information on the x-axis tick labels, but as far as I can tell, you can't extract the sample sizes and then somehow add them to the x-axis tick labels using the function scale_x_discrete() , you have to tell stat_summary() what geom you want it to use. 我尝试使用内置的stat_summary()函数来计算样本大小(即fun.y = "length" ),然后将该信息放在x轴刻度标签上,但据我所知,你可以' t提取样本大小,然后使用函数scale_x_discrete()以某种方式将它们添加到x轴刻度标签,您必须告诉stat_summary()您希望它使用哪个geom。 You could set geom="text" , but then you have to supply the labels, and the point is that the labels should be the values of the sample sizes, which is what stat_summary() is computing but which you can't get at (and you would also have to specify where you want the text to be placed, and again, it is difficult to figure out where to place it so that it lies directly underneath the x-axis tick labels). 你可以设置geom="text" ,但是你必须提供标签,关键是标签应该是样本大小的值,这是stat_summary()正在计算但你无法得到的(并且您还必须指定要放置文本的位置,同样,很难确定将其放置在何处以使其直接位于x轴刻度标签下方)。

The vignette "Extending ggplot2" ( http://docs.ggplot2.org/dev/vignettes/extending-ggplot2.html ) shows you how to create your own stat function that allows you to get directly at the data, but the problem is that you always have to define a geom to go with your stat function (ie, ggplot thinks you want to plot this information within the plot, not in the margins); 插图“扩展ggplot2”( http://docs.ggplot2.org/dev/vignettes/extending-ggplot2.html )向您展示如何创建自己的stat函数,使您可以直接获取数据,但问题是你总是需要定义一个geom来配合你的stat函数(例如, ggplot认为你想在图中绘制这些信息,而不是在边距中); as far as I can tell, you can't take the information you compute in your custom stat function, not plot anything in the plot area, and instead pass the information to a scales function like scale_x_discrete() . 据我所知,您不能在自定义统计函数中获取计算的信息,不能在绘图区域中绘制任何内容,而是将信息传递给scale_x_discrete()类的scale函数。 Here was my try at doing it this way; 这是我尝试这样做的方式; the best I could do was place the sample size information at the minimum value of y for each group: 我能做的最好的事情是将样本量信息放在每组的最小y值:

StatN <- ggproto("StatN", Stat,
    required_aes = c("x", "y"), 
    compute_group = function(data, scales) {
    y <- data$y
    y <- y[!is.na(y)]
    n <- length(y)
    data.frame(x = data$x[1], y = min(y), label = paste0("n=", n))
    }
)

stat_n <- function(mapping = NULL, data = NULL, geom = "text", 
    position = "identity", inherit.aes = TRUE, show.legend = NA, 
        na.rm = FALSE, ...) {
    ggplot2::layer(stat = StatN, mapping = mapping, data = data, geom = geom, 
        position = position, inherit.aes = inherit.aes, show.legend = show.legend, 
        params = list(na.rm = na.rm, ...))
}

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_point() + stat_n()

在此输入图像描述

I thought I had solved the problem by simply creating a wrapper function to ggplot : 我以为我通过简单地为ggplot创建一个包装函数来解决问题:

ggstripchart <- function(data, x.name, y.name,  
    point.params = list(), 
    x.axis.params = list(labels = levels(x)), 
    y.axis.params = list(), ...) {
    if(!is.factor(data[, x.name]))
    data[, x.name] <- factor(data[, x.name])
    x <- data[, x.name]
    y <- data[, y.name]
    params <- list(...)
    point.params    <- modifyList(params, point.params)
    x.axis.params   <- modifyList(params, x.axis.params)
    y.axis.params   <- modifyList(params, y.axis.params)

    point <- do.call("geom_point", point.params)

    stripchart.list <- list(
        point, 
        theme(legend.position = "none")
    )

    n <- sapply(split(y, x), length)
    x.axis.params$labels <- paste0(x.axis.params$labels, "\nN=", n)
    x.axis <- do.call("scale_x_discrete", x.axis.params)
    y.axis <- do.call("scale_y_continuous", y.axis.params)
    stripchart.list <- c(stripchart.list, x.axis, y.axis)           

    ggplot(data = data, mapping = aes_string(x = x.name, y = y.name)) + stripchart.list
}


ggstripchart(mtcars, "cyl", "mpg")

使用ggstripchart()的示例

However, this function does not work correctly with faceting. 但是,此功能在分面时无法正常工作。 For example: 例如:

ggstripchart(mtcars, "cyl", "mpg") + facet_wrap(~am)

shows the the sample sizes for both facets combined for each facet. 显示每个方面组合的两个面的样本大小。 I would have to build faceting into the wrapper function, which defeats the point of trying to use everything ggplot has to offer. 我必须在包装器功能中构建分面,这会破坏尝试使用ggplot提供的所有ggplot

使用带有facet_wrap的ggstripchart的示例

If anyone has any insights to this problem I would be grateful. 如果有人对这个问题有任何见解,我将不胜感激。 Thanks so much for your time! 非常感谢你的时间!

My solution might be a little simple but it works well. 我的解决方案可能有点简单,但效果很好。

Given an example with faceting by am I start by creating labels using paste and \\n . 给出一个关于刻面的示例,我首先使用paste\\n创建标签。

mtcars2 <- mtcars %>% 
  group_by(cyl, am) %>% mutate(n = n()) %>% 
  mutate(label = paste0(cyl,'\nN = ',n))

I then use these labels instead of cyl in the ggplot code 然后我在ggplot代码中使用这些标签而不是cyl

ggplot(mtcars2,
   aes(x = factor(label), y = mpg, color = factor(label))) + 
  geom_point() + 
  xlab('cyl') + 
  facet_wrap(~am, scales = 'free_x') +
  theme(legend.position = "none")

To produce something like the figure below. 产生类似下图的东西。

在此输入图像描述

You can print the counts below the x-axis labels using geom_text if you turn off clipping, but you'll probably have to tweak the placement. 如果关闭剪裁,可以使用geom_text打印x轴标签下方的计数,但是您可能需要调整放置位置。 I've included a "nudge" parameter for that in the code below. 我在下面的代码中包含了一个“nudge”参数。 Also, the method below is intended for cases where all the facets (if any) are column facets. 此外,下面的方法适用于所有方面(如果有)是列方面的情况。

I realize you ultimately want code that will work inside a new geom, but perhaps the examples below can be adapted for use in a geom. 我意识到你最终想要的代码可以在一个新的geom中运行,但是下面的例子也许可以适用于geom。

library(ggplot2)
library(dplyr)

pgg = function(dat, x, y, facet=NULL, nudge=0.17) {

  # Convert x-variable to a factor
  dat[,x] = as.factor(dat[,x])

  # Plot points
  p = ggplot(dat, aes_string(x, y)) +
    geom_point(position=position_jitter(w=0.3, h=0)) + theme_bw() 

  # Summarise data to get counts by x-variable and (if present) facet variables
  dots = lapply(c(facet, x), as.symbol)
  nn = dat %>% group_by_(.dots=dots) %>% tally

  # If there are facets, add them to the plot
  if (!is.null(facet)) {
    p = p + facet_grid(paste("~", paste(facet, collapse="+")))
  }

  # Add counts as text labels
  p = p + geom_text(data=nn, aes(label=paste0("N = ", nn$n)),
                    y=min(dat[,y]) - nudge*1.05*diff(range(dat[,y])), 
                    colour="grey20", size=3.5) +
    theme(axis.title.x=element_text(margin=unit(c(1.5,0,0,0),"lines")))

  # Turn off clipping and return plot
  p <- ggplot_gtable(ggplot_build(p))
  p$layout$clip[p$layout$name=="panel"] <- "off"
  grid.draw(p)

}

pgg(mtcars, "cyl", "mpg")
pgg(mtcars, "cyl", "mpg", facet=c("am","vs"))

在此输入图像描述

在此输入图像描述

Another, potentially more flexible, option is to add the counts to the bottom of the plot panel. 另一种可能更灵活的选择是将计数添加到绘图面板的底部。 For example: 例如:

pgg = function(dat, x, y, facet_r=NULL, facet_c=NULL) {

  # Convert x-variable to a factor
  dat[,x] = as.factor(dat[,x])

  # Plot points
  p = ggplot(dat, aes_string(x, y)) +
    geom_point(position=position_jitter(w=0.3, h=0)) + theme_bw() 

  # Summarise data to get counts by x-variable and (if present) facet variables
  dots = lapply(c(facet_r, facet_c, x), as.symbol)
  nn = dat %>% group_by_(.dots=dots) %>% tally

  # If there are facets, add them to the plot
  if (!is.null(facet_r) | !is.null(facet_c)) {

    facets = paste(ifelse(is.null(facet_r),".",facet_r), " ~ " , 
                   ifelse(is.null(facet_c),".",facet_c))

    p = p + facet_grid(facets)
  }

  # Add counts as text labels
  p + geom_text(data=nn, aes(label=paste0("N = ", nn$n)),
                y=min(dat[,y]) - 0.15*min(dat[,y]), colour="grey20", size=3) +
    scale_y_continuous(limits=range(dat[,y]) + c(-0.1*min(dat[,y]), 0.01*max(dat[,y])))
}

pgg(mtcars, "cyl", "mpg")
pgg(mtcars, "cyl", "mpg", facet_c="am")
pgg(mtcars, "cyl", "mpg", facet_c="am", facet_r="vs")

在此输入图像描述

I have updated the EnvStats package to include a stat called stat_n_text which will add the sample size (the number of unique y -values) below each unique x -value. 我更新了EnvStats包,其中包含一个名为stat_n_textstat stat_n_text ,它将在每个唯一的x值下添加样本大小(唯一y值的数量)。 See the help file for stat_n_text for more information and a list of examples. 有关更多信息和示例列表,请参阅stat_n_text帮助文件 Below is a simple example: 下面是一个简单的例子:

library(ggplot2)
library(EnvStats)

p <- ggplot(mtcars, 
  aes(x = factor(cyl), y = mpg, color = factor(cyl))) + 
  theme(legend.position = "none")

p + geom_point() + 
  stat_n_text() + 
  labs(x = "Number of Cylinders", y = "Miles per Gallon")

stat_n_text的演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM