简体   繁体   中英

How to make a grouped bar chart with two categorical variable that shows proportion?

Consider a dataset like below

Col1   Col2
A      BOY
B      GIRL
A      BOY
B      BOY
A      BOY
B      GIRL

Both columns are categorical variables. I want to make a grouped bar chart for both variables that shows the Y axis as the proportion using position="fill"

How do I do that ?

This is what I have

ggplot(aboveData, aes(x =col1, fill = col2)) + geom_bar(position = "fill")

This comes up as a stacked bar graph. I want grouped.

We first tally the counts:

library(dplyr)
library(ggplot2)

df = structure(list(Col1 = structure(c(1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A", 
"B"), class = "factor"), Col2 = structure(c(1L, 2L, 1L, 1L, 1L, 
2L), .Label = c("BOY", "GIRL"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

tab <- df %>% group_by(Col1,Col2,.drop=FALSE) %>% tally()

It's not clear what you mean by proportion. If it is proportion within the X variable (as commonly plotted), then:

tab %>% mutate(perc=n/sum(n)) %>% 
ggplot() + geom_col(aes(x=Col1,y=perc,fill=Col2),position="dodge") + 
scale_y_continuous(labels =scales::percent)

在此处输入图片说明

If you meant proportion of everything, then:

tab %>% ungroup() %>% 
mutate(perc=n/sum(n)) %>% 
ggplot() + geom_col(aes(x=Col1,y=perc,fill=Col2),position="dodge") + 
scale_y_continuous(labels =scales::percent)

It might be easier to work with ggplot using data in a long format (instead of wide) and calculate the proportion of each level (A, B, Boy, Girl) for each variable (Col1, Col2).

#Your data
df<-data.frame(Col1 = rep(c("A","B"),3),
               Col2 = c("BOY","GIRL","BOY","BOY","BOY","GIRL"))

df1<-df %>%
  #Change to long format
  pivot_longer(cols = c(Col1,Col2),
               names_to = "var") %>%
  group_by(value, var) %>%
  #Get the frequencies of A, B, Boy and Girl
  count() %>%
  ungroup() %>%
  #Group by var, which now has level Col1 and Col2
  group_by(var) %>%
  #Calculate proportion
  mutate(perc = n / sum (n))

ggplot(df1, aes(x = var, 
                y = perc,
                fill = value)) + 
  geom_col(position = "dodge")

分组条形图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM