简体   繁体   中英

R - ggplot2 - Add arrow if geom_errorbar outside limits when x-axis is a factor variable

I want to use geom_segment to replace error bars with arrows when the error exceeds a certain limit. I found a previous post that addresses this question: R - ggplot2 - Add arrow if geom_errorbar outside limits

The code works well, except that my x-axis is a factor variable instead of a numeric variable. Using position_dodge within the geom_segment statement makes the arrows start in the correct location, but it doesn't change the terminal point (xend) and all arrows point towards one central point on the x-axis instead of going straight up from the origins.

Instead of recoding the x-axis to be numeric (I will use this code to create many plots that have a range of x-axis values, with the last numeric value always ending in "+"), is there a way to correct this within geom_segment?

Code used:

data$OR.95U_u = ifelse(data$OR.95U > 10, 10 , NA)

ggplot(data, aes(x = numAlleles, y = OR, fill = Outcome)) + 
  geom_bar(position = position_dodge(.5), stat = "identity", width = .4, color = "black") + geom_hline(yintercept = 1, linetype = "dashed", color = "black") + 
  ylim(0,10) + geom_errorbar(aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(.5)) +
  theme(legend.key = element_blank(), text = element_text(size = 11.5), legend.title = element_blank()) + 
  labs(x = "Number of rare alleles") +
  scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) + 
  geom_segment(aes(x = numAlleles, xend = numAlleles, y = OR, yend = OR.95U_u), position = position_dodge(.5), arrow = arrow(length = unit(0.3, "cm")))

Resulting figure

结果图

Ok, after investigating a bit, I didn't find a clean way of doing this, at it seems that position_dodge only change the x aes , and not the xend aes . position_nudge also don't work here, as it moves all the arrows at the same time. So I came with a dirty way of doing this. All we need is create a new variable with the desired xend position for the geom_segment . I try and came with a semi-automtized way of doing it, for any number of levels of the coloring variable, and also created a reproducible dataset to work with, as I'm sure this could be improved a lot by people with more knowledge than me. The code has inline comments expalining the steps:

library(tidyverse)

# dummy data (tried to replicate your plot data more or less accurately)
df <- tibble(
  numAlleles = rep(c("1", "2+"), each = 4),
  Outcome = rep(LETTERS[1:4], 2),
  OR = c(1.4, 1.5, 1.45, 2.3, 3.8, 4.2, 4.0, 1.55),
  OR.95U = c(1.9,2.1,1.9,3.8,12,12,12,12),
  OR.95L = c(0.9, 0.9, 0.9, 0.8, NA, NA,NA,NA)
) %>%
  mutate(
    OR.95U_u = if_else(OR.95U > 10, 10, NA_real_)
  )

# as it seems that position_dodge in a geom_segment only "dodge" the x aes and
# not the xend aes, we need to supply a custom xend. Also, we need to try
# to automatize the position, for more classes or different dodge widths.
# To do that, lets start with some parameters:
# position_dodge width
position_dodge_width <- 0.5
# number of bars per x axis class
bars_per_class <- length(unique(df$Outcome))
# total space available per class. In discrete vars, this is 1 au (arbitrary unit)
# for each class, but position_dodge only use the fraction of that unit
# indicated in the width parameter, so we need to calculate the real
# space available:
total_space_available <- 1 * position_dodge_width
# now we calculate the real bar width used by ggplot in these au, dividing the
# space available by the number of bars to plot for each class
bar_width_real <- (total_space_available / bars_per_class)
# position_dodge with discrete variables place bars to the left and to the right of the
# class au value, so we need to know when to place the xend to the left or
# to the right. Also, the number of bars has to be taken in to account, as
# in odd number of bars, one is located on the exact au value
if (bars_per_class%%2 == 0) {
  # we need an offset, as bars are wider than arrows, and we want them in the
  # middle of the bar
  offset_segment <- bar_width_real / 2
  # offset modifier to know when to substract or add the modifier
  offset_modifier <- c(rep(-1, bars_per_class%/%2), rep(1, bars_per_class%/%2))
  # we also need to know how meny bars to the left and how many to the right,
  # but, the first bar of each side is already taken in account with the offset,
  # so the bar modifier has to have one bar less for each side
  bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), seq(0, (bars_per_class%/%2-1)))
} else {
  # when odd number of columns, the offset is the same as the bar width
  offset_segment <- bar_width_real
  # and the modifiers have to have a middle zero value for the middle bar
  offset_modifier <- c(rep(-1, bars_per_class%/%2), 0, rep(1, bars_per_class%/%2))
  bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), 0, seq(0, (bars_per_class%/%2-1)))
}

# finally we create the vector of xend values needed:
df %>%
  mutate(
    numAlleles_u = as.numeric(as.factor(numAlleles)) + offset_modifier*(offset_segment + (bar_width_modifier*bar_width_real))
  )

ggplot(df, aes(x = numAlleles, y = OR, fill = Outcome)) + 
  geom_bar(
    position = position_dodge(position_dodge_width), stat = "identity",
    width = 0.4, color = "black"
  ) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
  ylim(0,10) +
  geom_errorbar(
    aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(position_dodge_width)
  ) +
  theme(
    legend.key = element_blank(), text = element_text(size = 11.5),
    legend.title = element_blank()
  ) + 
  labs(x = "Number of rare alleles") +
  scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) + 
  geom_segment(
    aes(x = numAlleles, xend = numAlleles_u, y = OR, yend = OR.95U_u),
    position = position_dodge(position_dodge_width), arrow = arrow(length = unit(0.3, "cm"))
  )

And the plot: 四杠

We can check that for three levels discrete variables also works:

df_three_bars <- df %>% filter(Outcome != 'D')
bars_per_class <- length(unique(df_three_bars$Outcome))
total_space_available <- 1 * position_dodge_width
bar_width_real <- (total_space_available / bars_per_class)
if (bars_per_class%%2 == 0) {
  offset_segment <- bar_width_real / 2
  offset_modifier <- c(rep(-1, bars_per_class%/%2), rep(1, bars_per_class%/%2))
  bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), seq(0, (bars_per_class%/%2-1)))
} else {
  offset_segment <- bar_width_real
  offset_modifier <- c(rep(-1, bars_per_class%/%2), 0, rep(1, bars_per_class%/%2))
  bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), 0, seq(0, (bars_per_class%/%2-1)))
}
df_three_bars <- df_three_bars %>%
  mutate(
    numAlleles_u = as.numeric(as.factor(numAlleles)) + offset_modifier*(offset_segment + (bar_width_modifier*bar_width_real))
  )

ggplot(df_three_bars, aes(x = numAlleles, y = OR, fill = Outcome)) + 
  geom_bar(
    position = position_dodge(position_dodge_width), stat = "identity",
    width = 0.4, color = "black"
  ) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
  ylim(0,10) +
  geom_errorbar(
    aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(position_dodge_width)
  ) +
  theme(
    legend.key = element_blank(), text = element_text(size = 11.5),
    legend.title = element_blank()
  ) + 
  labs(x = "Number of rare alleles") +
  scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) + 
  geom_segment(
    aes(x = numAlleles, xend = numAlleles_u, y = OR, yend = OR.95U_u),
    position = position_dodge(position_dodge_width), arrow = arrow(length = unit(0.3, "cm"))
  )

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM