简体   繁体   中英

Adding subcategories to a legend in ggplot2

Let's say we have this following hierarcichal data on the habitats that make up my fantasy island (which is of course always warm and sunny!)

set.seed(1)

hab_dat <- data.frame(
  habitat_type = rep(c("sea", "coast", "land"), times = 1, each = 3),
  habitat_name = c("rocky", "sandy", "seaweed",
                  "beach", "pebbles", "rockpools",
                  "fields", "hills", "forest"),
  area_km2 = sample(10:40, size =9))
  
hab_dat

I want to plot the total area of each habitat type and so write following code

hab_dat %>% 
  group_by(habitat_type) %>% 
  summarise(area_km2 = sum(area_km2)) %>%
  ggplot(aes(x = habitat_type, y = area_km2, fill = habitat_type)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = c("gold", "forestgreen", "blue"))

Looks good, but the legend is not very informative. I would like for the habitats contained within each habitat type to be included in the legend under the appropriate habitat type, just as qualitative information. Here is an example I made in paint. 在此处输入图像描述

I can get a bit closer using the following code without affecting the appearance of the plot, however, I am missing the habitat_type titles and also have multiple tiles for the same colour.

hab_dat <- hab_dat %>% mutate(col = rep(c("blue", "gold", "forestgreen"), times = 1, each = 3))

pal <- setNames(as.character(hab_dat$col), as.character(hab_dat$habitat_name))

ggplot(hab_dat, aes(x = habitat_type, y = area_km2, fill = habitat_name)) +
  geom_bar(position = "stack", stat = "identity") +
  scale_fill_manual(values = pal)

在此处输入图像描述

I have been looking at solutions along the lines of this one but am trying for a more automated solution as my actual data is a bit larger than this, and also one that presents the colour tile once per group as per my drawing.

I don't think there is an elegant solution that adresses your problem. I'll suggest here that you format the labels to imply the hierarchy.

library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

set.seed(1)

hab_dat <- data.frame(
  habitat_type = rep(c("sea", "coast", "land"), times = 1, each = 3),
  habitat_name = c("rocky", "sandy", "seaweed",
                   "beach", "pebbles", "rockpools",
                   "fields", "hills", "forest"),
  area_km2 = sample(10:40, size =9))

# Format labels
labels <- split(hab_dat$habitat_name, hab_dat$habitat_type)
labels <- unlist(Map(function(top, bottom) {
  paste0(top, "\n", paste("- ", bottom, collapse = "\n"))
}, top = names(labels), bottom = labels))


hab_dat %>% 
  group_by(habitat_type) %>% 
  summarise(area_km2 = sum(area_km2)) %>%
  ggplot(aes(x = habitat_type, y = area_km2, fill = habitat_type)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(
    values = c("gold", "forestgreen", "blue"),
    labels = function(i) {labels[i]} # Lookup label
  )

Created on 2022-07-19 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM