简体   繁体   中英

R - ggplot2 'dodge' geom_step() to overlap geom_bar()

Plotting counts using ggplot2's geom_bar(stat="identity") is an effective method of visualising counts. I would like to use this method to display my observed counts and compare them to expected counts I would like to do this by using geom_step to overlay a stairstep plot layer over the barplot.

However when I do this I run into the problem that barplots by default have their positions dodged but geom_step does not. For example using both continuous and discrete dependent variables:

library(tidyverse)

test <- data_frame(a = 1:10, b = runif(10, 1, 10))

test_plot <- ggplot(test, aes(a, b)) + 
  geom_bar(stat="identity") + 
  geom_step(color = 'red')

test2 <- data_frame(a = letters[1:10], b = runif(10, 1, 10))

test2_plot <- ggplot(test2, aes(a, b, group = 1)) + 
  geom_bar(stat="identity") + 
  geom_step(color = 'red'))

gridExtra::grid.arrange(test_plot, test2_plot, ncol = 2)

在此处输入图片说明

As you can see the two layers are offset which is undesirable.

Reading the docs I see that geom_path has a position = option however trying something like geom_step(color = 'red', position = position_dodge(width = 0.5)) does not do what I want rather it compresses the bars and the stairstep line towards the centre. Another option is to adjust the data directly like this geom_step(aes(a-0.5, b), color = 'red') which produces a near acceptable result for data with continuous dependent variables. You could also calculate the stairstep line as a function and plot it using stat_function() .

在此处输入图片说明

However these approaches are not applicable to data with discrete dependent variables and my actual data has discrete dependent variables so I need another answer.

Additionally when shifted the stairstep line will not cover the last bar as seen in the above image. Is there an easy elegant way to extend it to cover the last bar?

If geom_step() is the wrong approach and what I'm trying to get can be achieved in another way I am interested in that too.

I think the most efficient way to solve this problem is to define custom geom in the following way:

library(tidyverse)

geom_step_extend <- function(data, extend = 1, nudge = -0.5,
                             ...) {
  # Function for computing the last segment data
  get_step_extend_data <- function(data, extend = 1, nudge = -0.5) {
    data_out <- as.data.frame(data[order(data[[1]]), ])
    n <- nrow(data)
    max_x_y <- data_out[n, 2]
    if (is.numeric(data_out[[1]])) {
      max_x <- data_out[n, 1] + nudge
    } else {
      max_x <- n + nudge
    }

    data.frame(x = max_x,
               y = max_x_y,
               xend = max_x + extend,
               yend = max_x_y)
  }

  # The resulting geom
  list(
    geom_step(position = position_nudge(x = nudge), ...),
    geom_segment(
      data = get_step_extend_data(data, extend = extend, nudge = nudge),
      mapping = aes(x = x, y = y,
                    xend = xend, yend = yend),
      ...
    )
  )
}

set.seed(111)
test <- data_frame(a = 1:10, b = runif(10, 1, 10))
test2 <- data_frame(a = letters[1:10], b = runif(10, 1, 10))

test_plot <- ggplot(test, aes(a, b, group = 1)) + 
  geom_bar(stat = "identity") + 
  geom_step_extend(data = test, colour = "red")

test2_plot <- ggplot(test2, aes(a, b, group = 1)) + 
  geom_bar(stat = "identity") + 
  geom_step_extend(data = test2, colour = "red")

gridExtra::grid.arrange(test_plot, test2_plot, ncol = 2)

Example_output

Basically this solution consists from three parts:

  1. Nudge to the left with position_nudge the step curve by desired value (in this case -0.5);
  2. Compute the absent (the one on the right) segment data with function get_step_extend_data . Its behaviour is inspired from ggplot2:::stairstep which is the underlying function of geom_step ;
  3. Compose geom_step with geom_segment in separate geom with list .

Here's a rather crude solution, but should work in this case.

Create an alternate data frame that expanded each line to extend the x-axis by -0.5 and 0.5:

test2 <- data.frame(a = lapply(1:nrow(test), function(x) c(test[x,"a"]-.5, test[x,"a"], test[x, "a"]+0.5)) %>% unlist, 
                b = lapply(1:nrow(test), function(x) rep(test[x,"b"], 3)) %>% unlist)

Plot the outline with geom_line argument:

ggplot(test, aes(a,b)) + geom_bar(stat="identity", alpha=.7) + geom_line(data=test2, colour="red")

在此处输入图片说明

This will look tidier if you set the geom_bar width to 1:

ggplot(test, aes(a,b)) + geom_bar(width=1, stat="identity", alpha=.7) + geom_line(data=test2, colour="red")

在此处输入图片说明

Since ggplot2 version 3.3.0 this is option is now supported by geom_step using direction = "mid" :

library(tidyverse)

test <- data_frame(a = 1:10, b = runif(10, 1, 10))

test_plot <- ggplot(test, aes(a, b)) + 
  geom_bar(stat="identity") + 
  geom_step(color = 'red', direction = "mid", size = 2)

test_plot

在此处输入图片说明

I like molx's answer of using direction = 'mid' for geom_step() in ggplot2 version 3.3.0. However, for time series I recommend shifting the data used for the x-axis of the geom_bar() or geom_col() plot:

data.frame(time = seq(as.POSIXct('2020-10-01 05:00'), 
                      as.POSIXct('2020-10-01 14:00'), by = 'hour'), 
                 value = runif(10, 0, 100)) %>%
  mutate(time_shift_bars = times + 30*60) %>% 
  ggplot(df, mapping = aes(y = value)) + 
  geom_step(color  = 'red', mapping = aes(x = time)) +
  geom_col(width = 60*60, mapping = aes(x = time_shift_bars))

![resulting plot](https://i.stack.imgur.com/fJBac.png)

The reason I prefer this is because for example 09:00 occurs at a specific instance, and the data represents the average for the following hour. If your time-series data is not averaged like this, use the `direction` method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM