简体   繁体   中英

R bar-plot with different variables in multiple columns

I want to create an R bar-plot with different variables in multiple columns, all in one chart. I am only able to do a 2x2 plot with the following code:

barplot(table(y = cut$Gender,x = cut$Education))

Even so, Gender gets stacked on top of Education .

受访者性别和教育程度

The type of chart I want looks like this: 在此处输入图片说明

My sample dataset is:

structure(list(Gender = c("Male", "Male", "Male", "Male", "Male", 
"Male", "Male", "Male", "Female", "Male", "Male", "Male", "Male", 
"Female", "Male", "Female", "Male", "Male", "Male", "Male"), 
    Age = c("45-54 yrs", "35-44 yrs", "25-34 yrs", "25-34 yrs", 
    "25-34 yrs", "45-54 yrs", "25-34 yrs", "25-34 yrs", "25-34 yrs", 
    "35-44 yrs", "18-24 yrs", "25-34 yrs", "25-34 yrs", "55-64 yrs", 
    "35-44 yrs", "35-44 yrs", "35-44 yrs", "45-54 yrs", "35-44 yrs", 
    "45-54 yrs"), Employment = c("Civil servant", "Private sector", 
    "Private sector", "Private sector", "Trader", "Civil servant", 
    "Private sector", "Private sector", "Private sector", "Civil servant", 
    "Student", "Student", "Civil servant", "Retired", "Self-employed", 
    "Private sector", "Civil servant", "Civil servant", "Private sector", 
    "Private sector"), Marriage = c("Married", "Married", "Married", 
    "Married", "Single, never married", "Married", "Married", 
    "Married", "Married", "Married", "Single, never married", 
    "Single, never married", "Married", "Married", "Married", 
    "Married", "Married", "Married", "Married", "Married"), Education = c("Advanced degree", 
    "Advanced degree", "Bachelor's degree", "Bachelor's degree", 
    "Secondary education", "Advanced degree", "Bachelor's degree", 
    "Bachelor's degree", "Secondary education", "Secondary education", 
    "Secondary education", "Secondary education", "Advanced degree", 
    "Bachelor's degree", "Basic education", "Advanced degree", 
    "Advanced degree", "Advanced degree", "Advanced degree", 
    "Advanced degree"), Residence = c("Ashanti", "Ashanti", "Ashanti", 
    "Ashanti", "Ashanti", "Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo", 
    "Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo", 
    "Central", "Central", "Eastern", "Greater Accra", "Greater Accra", 
    "Greater Accra", "Greater Accra", "Greater Accra"), Experience = c("Never", 
    "Never", "Never", "Never", "Never", "Never", "Never", "Never", 
    "Never", "Never", "Never", "Never", "Never", "Never", "Never", 
    "Never", "Never", "Never", "Never", "Never")), .Names = c("Gender", 
"Age", "Employment", "Marriage", "Education", "Residence", "Experience"
), row.names = c(NA, 20L), class = "data.frame")

Here is an approach:

First convert the data to long format, here one has two options melt from reshape package or gather from tidyr . Here I will use tidyverse library which loads many useful packages.

library(tidyverse)

 df %>%
      gather(variable, value) 

Then make a bar plot with ggplot2

ggplot()+
     geom_bar(aes(x = variable, fill = value), color = "black" , position = "stack", show.legend = FALSE)

To add text annotations we make a geom_text layer, the positions of the labels will be determined by stat = "count" which calculates a special variable ..count.. corresponding to the top of the bars since this is a bit crude on the plot we can adjust it with vjust = 1

geom_text(stat = "count", aes(x = variable, label =  value,
                              y = ..count..,
                              group = value),
          position = "stack", vjust = 1)

To add percent labels on y axis the usual is y = (..count..)/sum(..count..) , however the sum(..count..) is the sum of counts across all variables and is not appropriate here so the easiest solution is to manually label

scale_y_continuous(labels =  c("0%", "25%", "50%", "75%", "100%"),
                   breaks = c(0, 5, 10, 15, 20))

How it looks all together:

library(tidyverse)

 df %>%
  gather(variable, value) %>%
  ggplot()+
  geom_bar(aes(x = variable, fill = value),
           color = " black",
           position = "stack", show.legend = FALSE)+
  geom_text(stat = "count",
             aes(x = variable,
                 label =  value,
                 y = ..count..,
                 group = value),
             position = "stack", vjust = 1) +
scale_y_continuous(labels =  c("0%", "25%", "50%", "75%", "100%"),
                   breaks = c(0, 5, 10, 15, 20))

在此处输入图片说明

another option is y = ..count../sum(..count..)*7 since there are 7 variables

df %>%
  gather(variable, value) %>%
  ggplot()+
  geom_bar(aes(x = variable, y = ..count../sum(..count..)*7, fill = value), color= " black", position = "stack", show.legend = FALSE)+
  geom_text(stat = "count", aes(x = variable, label =  value,  y = ..count../sum(..count..)*7, group = value), position = "stack", vjust = 1)+
  scale_y_continuous(labels = scales::percent)+
  ylab("")

same output graph

You can even add a conditional line break in the labels using mutate with gsub and negative lookahead

df %>%
  gather(variable, value) %>% 
  mutate(label = gsub(" (?!yrs)", "\n",  value, perl = T)) %>%
  ggplot()+
  geom_bar(aes(x = variable, y = ..count../sum(..count..)*7, fill = value), color= " black", position = "stack", show.legend = FALSE)+
  geom_text(stat = "count", aes(x = variable, label =  label,  y = ..count../sum(..count..)*7, group = value), position = "stack", vjust = 1)+
  scale_y_continuous(labels = scales::percent)+
  ylab("")

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM