简体   繁体   中英

Visualizing the difference between two points with ggplot2

I want to visualize the difference between two points with a line/bar in ggplot2.

Suppose we have some data on income and spending as a time series. We would like to visualize not only them, but the balance (=income - spending) as well. Furthermore, we would like to indicate whether the balance was positive (=surplus) or negative (=deficit).

I have tried several approaches, but none of them produced a satisfying result. Here we go with a reproducible example.

# Load libraries and create LONG data example data.frame
library(dplyr)
library(ggplot2)
library(tidyr)

df <- data.frame(year  = rep(2000:2009, times=3),
                 var   = rep(c("income","spending","balance"), each=10),
                 value = c(0:9, 9:0, rep(c("deficit","surplus"), each=5)))
df

1.Approach with LONG data

Unsurprisingly, it doesn't work with LONG data, because the geom_linerange arguments ymin and ymax cannot be specified correctly. ymin=value, ymax=value is definately the wrong way to go (expected behaviour). ymin=income, ymax=spending is obviously wrong, too (expected behaviour).

df %>% 
ggplot() + 
  geom_point(aes(x=year, y=value, colour=var)) +
  geom_linerange(aes(x=year, ymin=value, ymax=value, colour=net))

#>Error in function_list[[i]](value) : could not find function "spread"

2.Approach with WIDE data

I almost got it working with WIDE data. The plot looks good, but the legend for the geom_point(s) is missing (expected behaviour). Simply adding show.legend = TRUE to the two geom_point(s) doesn't solve the problem as it overprints the geom_linerange legend. Besides, I would rather have the geom_point lines of code combined in one (see 1.Approach).

df %>% 
  spread(var, value) %>% 
ggplot() + 
  geom_linerange(aes(x=year, ymin=spending, ymax=income, colour=balance)) +
  geom_point(aes(x=year, y=spending), colour="red", size=3) +
  geom_point(aes(x=year, y=income), colour="green", size=3) +
  ggtitle("income (green) - spending (red) = balance")

2.方法

3.Approach using LONG and WIDE data

Combining the 1.Approach with the 2.Approach results in yet another unsatisfying plot. The legend does not differentiate between balance and var (=expected behaviour).

ggplot() + 
  geom_point(data=(df %>% filter(var=="income" | var=="spending")),
             aes(x=year, y=value, colour=var)) +
  geom_linerange(data=(df %>% spread(var, value)), 
                 aes(x=year, ymin=spending, ymax=income, colour=balance)) 

3,方法

  • Any (elegant) way out of this dilemma?
  • Should I use some other geom instead of geom_linerange ?
  • Is my data in the right format?

Try

ggplot(df[df$var != "balance", ]) + 
  geom_point(
    aes(x = year, y = value, fill = var), 
        size=3, pch = 21, colour = alpha("white", 0)) +
  geom_linerange(
    aes(x = year, ymin = income, ymax = spending, colour = balance), 
        data = spread(df, var, value)) +
  scale_fill_manual(values = c("green", "red"))

Output: 在此处输入图片说明

The main idea is that we use two different types of aesthetics for colours ( fill for the points, with the appropriate pch , and colour for the lines) so that we get separate legends for each.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM