简体   繁体   中英

how to automate the legend in a ggplot chart?

Consider this simple example

library(dplyr)
library(forcats)
library(ggplot2)

mydata <- data_frame(cat1 = c(1,1,2,2),
           cat2 = c('a','b','a','b'),
           value = c(10,20,-10,-20),
           time = c(1,2,1,2))

mydata <- mydata %>% mutate(cat1 = factor(cat1),
                 cat2 = factor(cat2))

> mydata
# A tibble: 4 x 4
  cat1  cat2  value  time
  <fct> <fct> <dbl> <dbl>
1 1     a      10.0  1.00
2 1     b      20.0  2.00
3 2     a     -10.0  1.00
4 2     b     -20.0  2.00

Now, I want to create a chart where I interact the two factor variables. I know I can use interact in ggplot2 (see below).

My big problem is that I do not know how to automate the labeling (and the colouring) of the interactions so that I can avoid any manual error using scale_colour_manual .

For instance:

ggplot(mydata,
       aes(x = time, y = value, col = interaction(cat1, cat2) )) + 
  geom_point(size=15) + theme(legend.position="bottom")+
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
  theme(legend.position="bottom",
        legend.text=element_text(size=12, face = "bold")) +
  scale_colour_manual(name = ""
                      , values=c("red","red4","royalblue","royalblue4")
                      , labels=c("1-b","1-a"
                                 ,"2-a","2-b"))

shows:

在此处输入图片说明

which has the wrong labels because of a (voluntarily) mistake I made in scale_colour_manual() . Indeed, the bright red dot is 1-a and not 1-b (note how the labels are simply the concatenation of the variable names). The idea is that with more factor levels, guessing the right order can be tricky.

Is there a way to automate this labeling (even better: labeling AND coloring)? Perhaps using forcats ? Perhaps creating the labels as strings in the dataframe beforehand?

Thanks!

If the number of factor levels for cat1 / cat2 are not fixed (but could potentially be much larger than 2), I would try to calculate the appropriate colours with hsv() , rather than assign them manually.

The colour cheatsheet here summarise the HSV colour model rather nicely:

色轮

Hue (h) is essentially your rainbow colour wheel, Saturation (s) determines how intense the colour is, and Value (v) how dark it is. Each parameter accepts values in the range [0, 1].

Here's how I would adapt it for this use case:

mydata2 <- mydata %>%

  # use "-" instead of the default "." since we are using that for the labels anyway
  mutate(interacted.variable = interaction(cat1, cat2, sep = "-")) %>%

  # cat1: assign hue evenly across the whole wheel,
  # cat2: restrict both saturation & value to the [0.3, 1], as it can look too
  #       faint / dark otherwise
  mutate(colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
                      s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
                      v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))

# create the vector of colours for scale_colour_manual()
manual.colour <- mydata2 %>% select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)

> colour.vector
      1-a       1-b       2-a       2-b 
"#3AA6A6" "#00FFFF" "#A63A3A" "#FF0000" 

With the colours calculated automatically for any number of factors, plotting becomes quite straightforward:

ggplot(mydata2,
       aes(x = time, y = value, colour = interacted.variable)) +
  geom_point(size = 15) +
  scale_colour_manual(name = "",
                      values = colour.vector,
                      breaks = names(colour.vector)) +
  theme(legend.position = "bottom")

阴谋

An illustration with more factor levels (code is the same except for the addition of specifying guide_legend(byrow = TRUE) in the colour scale:

mydata3 <- data.frame(
  cat1 = factor(rep(1:3, times = 5)),
  cat2 = rep(LETTERS[1:5], each = 3),
  value = 1:15,
  time = 15:1
) %>%
  mutate(interacted.variable = interaction(cat1, cat2, sep = "-"),
         colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
                      s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
                      v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))

manual.colour <- mydata3 %>% arrange(cat1, cat2) %>%
  select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)

ggplot(mydata3,
       aes(x = time, y = value, colour = interacted.variable)) +
  geom_point(size = 15) +
  scale_colour_manual(name = "",
                      values = colour.vector,
                      breaks = names(colour.vector),
                      guide = guide_legend(byrow = TRUE)) +
  theme(legend.position = "bottom")

例子

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM