简体   繁体   中英

Plot ratio of geom_bar on second y-axis

I would like to plot the ratio of geom_bar using a plot_line on a second axis. Here is my dataframe :

df <- data.frame(code=c('F6', 'F6','D4', 'D4', 'F5', 'F5', 'C4', 'C4', 'F7', 'F7'),
           group=c('0','1','0','1','0','1','0','1','0','1'),
           count=c(80, 700, 30, 680, 100, 360, 70, 230, 40, 200))

For the moment, I plot the following figure :

ggplot(df, aes(x=code, y=count, fill=group)) +
  geom_bar(stat ="identity", position="dodge")

在此处输入图片说明

And I would like to have also the ratio between groups. For example, for C4 it would be 70/230*100=30%. Here is what it could represent:

在此处输入图片说明

Any idea ?

You can do this by using the tidyverse library to calculate the percentage for each group, then adding that to your plot using a secondary axis:

library(tidyverse)

df <- data.frame(code=c('F6', 'F6','D4', 'D4', 'F5', 'F5', 'C4', 'C4', 'F7', 'F7'),
                 group=c('0','1','0','1','0','1','0','1','0','1'),
                 count=c(80, 700, 30, 680, 100, 360, 70, 230, 40, 200))

Now, make another data frame that calculates the percentage as you directed. I used spread to do this. Also, I calculated the percentage as 7 TIMES the percentage calculated, because you want to put the percentage (which goes from 0-100) on the same graph which goes from 0-700 counts. So 7*100 will fill the entire graph. I also added a new field called "order" because geom_line doesn't like using a factor (group) to connect a line.

  percentage.df <- df %>% 
      spread(group, count) %>% 
      mutate(percentage = 7*(`0`/`1`)*100) %>% 
      mutate(order = c(1:nrow(.)))

Now, when you plot this, you can specify a secondary axis, but you have to remember to tell ggplot that you should divide the numbers by 7 for the secondary axis labels to make sense.

ggplot(df, aes(x=code, y=count)) +
  geom_bar(stat ="identity", position="dodge", aes(fill=group)) +
  geom_point(data = percentage.df, aes(code, percentage)) +
  geom_line(data = percentage.df, aes(order, percentage)) +
  scale_y_continuous(sec.axis = sec_axis(~ . /7))

在此处输入图片说明

You can try to normalise the ratios to the maximum y-value ( count ).

library(tidyverse)
MAX= max(df$count)

df %>% 
  group_by(code) %>% 
  mutate(ratio = count[1]/count[2]) %>%
  mutate(ratio_norm = MAX*ratio) %>%   
 ggplot(aes(x=code)) +
  geom_col(aes(y=count, fill=group), position="dodge") + 
  geom_point(data = . %>% distinct(code, ratio_norm), aes(y=ratio_norm)) +
  geom_line(data = . %>% distinct(code,  ratio_norm), aes(y=ratio_norm, group = 1)) + 
  scale_y_continuous(sec.axis = sec_axis(~./MAX, labels = scales::percent))

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM