简体   繁体   中英

adding both count and proportion to histogram in ggplot2 using dual y-axes

I am trying to create a histogram with both counts and relative frequency (or proportion) data displayed on the y-axes, the former on the left y-axis and the latter on the right. I have managed to create the basic plot but the percentage values I am getting are incorrect.

# loading necessary libraries
library(ggplot2)
library(scales)

# attempt to display both counts and proportions
ggplot2::ggplot(
  data = datasets::ToothGrowth,
  mapping = ggplot2::aes(x = len)
) +
  ggplot2::stat_bin(
    col = "black",
    alpha = 0.7,
    na.rm = TRUE,
    mapping = ggplot2::aes(
      y = ..count..
    )
  ) +
  ggplot2::scale_y_continuous(
    sec.axis = ggplot2::sec_axis(trans = ~ (.)/sum(.),
                                 labels = scales::percent,
                                 name = "proportion (in %)")
  ) +
  ggplot2::ylab("count") +
  ggplot2::guides(fill = FALSE)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This is clear if you create another histogram that shows just the proportion data.

# just displaying proportion
ggplot2::ggplot(
  data = datasets::ToothGrowth,
  mapping = ggplot2::aes(x = len)
) +
  ggplot2::stat_bin(
    col = "black",
    alpha = 0.7,
    na.rm = TRUE,
    mapping = ggplot2::aes(
      y = ..count.. / sum(..count..)
    )
  ) +
  ggplot2::scale_y_continuous(labels = scales::percent) +
  ggplot2::ylab("proportion (in %)") +
  ggplot2::guides(fill = FALSE)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

My guess is that the transformation function I am using in sec_axis function is incorrect. But I don't know any other method to do this. Would appreciate any help offered.

Because every bar height is going to get divided by the same number, you could pre-compute the denominator ( tot_obs below), and call that scalar in the trans function:

library(ggplot2)
library(scales)


# data
df <- datasets::ToothGrowth

# scalar for denominator
tot_obs <- nrow(df)

ggplot(data = df, mapping = aes(x = len)) +
  geom_histogram() +
  scale_y_continuous(
    sec.axis = sec_axis(trans = ~./tot_obs, labels = percent, 
                        name = "proportion (in %)")) +
  ylab("count") +
  guides(fill = FALSE)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2018-08-16 by the reprex package (v0.2.0).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM