I want to create an R bar-plot with different variables in multiple columns, all in one chart. I am only able to do a 2x2 plot with the following code:
barplot(table(y = cut$Gender,x = cut$Education))
Even so, Gender gets stacked on top of Education .
The type of chart I want looks like this:
My sample dataset is:
structure(list(Gender = c("Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Female", "Male", "Male", "Male", "Male",
"Female", "Male", "Female", "Male", "Male", "Male", "Male"),
Age = c("45-54 yrs", "35-44 yrs", "25-34 yrs", "25-34 yrs",
"25-34 yrs", "45-54 yrs", "25-34 yrs", "25-34 yrs", "25-34 yrs",
"35-44 yrs", "18-24 yrs", "25-34 yrs", "25-34 yrs", "55-64 yrs",
"35-44 yrs", "35-44 yrs", "35-44 yrs", "45-54 yrs", "35-44 yrs",
"45-54 yrs"), Employment = c("Civil servant", "Private sector",
"Private sector", "Private sector", "Trader", "Civil servant",
"Private sector", "Private sector", "Private sector", "Civil servant",
"Student", "Student", "Civil servant", "Retired", "Self-employed",
"Private sector", "Civil servant", "Civil servant", "Private sector",
"Private sector"), Marriage = c("Married", "Married", "Married",
"Married", "Single, never married", "Married", "Married",
"Married", "Married", "Married", "Single, never married",
"Single, never married", "Married", "Married", "Married",
"Married", "Married", "Married", "Married", "Married"), Education = c("Advanced degree",
"Advanced degree", "Bachelor's degree", "Bachelor's degree",
"Secondary education", "Advanced degree", "Bachelor's degree",
"Bachelor's degree", "Secondary education", "Secondary education",
"Secondary education", "Secondary education", "Advanced degree",
"Bachelor's degree", "Basic education", "Advanced degree",
"Advanced degree", "Advanced degree", "Advanced degree",
"Advanced degree"), Residence = c("Ashanti", "Ashanti", "Ashanti",
"Ashanti", "Ashanti", "Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo",
"Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo",
"Central", "Central", "Eastern", "Greater Accra", "Greater Accra",
"Greater Accra", "Greater Accra", "Greater Accra"), Experience = c("Never",
"Never", "Never", "Never", "Never", "Never", "Never", "Never",
"Never", "Never", "Never", "Never", "Never", "Never", "Never",
"Never", "Never", "Never", "Never", "Never")), .Names = c("Gender",
"Age", "Employment", "Marriage", "Education", "Residence", "Experience"
), row.names = c(NA, 20L), class = "data.frame")
Here is an approach:
First convert the data to long format, here one has two options melt
from reshape
package or gather
from tidyr
. Here I will use tidyverse
library which loads many useful packages.
library(tidyverse)
df %>%
gather(variable, value)
Then make a bar plot with ggplot2
ggplot()+
geom_bar(aes(x = variable, fill = value), color = "black" , position = "stack", show.legend = FALSE)
To add text annotations we make a geom_text
layer, the positions of the labels will be determined by stat = "count"
which calculates a special variable ..count..
corresponding to the top of the bars since this is a bit crude on the plot we can adjust it with vjust = 1
geom_text(stat = "count", aes(x = variable, label = value,
y = ..count..,
group = value),
position = "stack", vjust = 1)
To add percent labels on y axis the usual is y = (..count..)/sum(..count..)
, however the sum(..count..) is the sum of counts across all variables and is not appropriate here so the easiest solution is to manually label
scale_y_continuous(labels = c("0%", "25%", "50%", "75%", "100%"),
breaks = c(0, 5, 10, 15, 20))
How it looks all together:
library(tidyverse)
df %>%
gather(variable, value) %>%
ggplot()+
geom_bar(aes(x = variable, fill = value),
color = " black",
position = "stack", show.legend = FALSE)+
geom_text(stat = "count",
aes(x = variable,
label = value,
y = ..count..,
group = value),
position = "stack", vjust = 1) +
scale_y_continuous(labels = c("0%", "25%", "50%", "75%", "100%"),
breaks = c(0, 5, 10, 15, 20))
another option is y = ..count../sum(..count..)*7
since there are 7 variables
df %>%
gather(variable, value) %>%
ggplot()+
geom_bar(aes(x = variable, y = ..count../sum(..count..)*7, fill = value), color= " black", position = "stack", show.legend = FALSE)+
geom_text(stat = "count", aes(x = variable, label = value, y = ..count../sum(..count..)*7, group = value), position = "stack", vjust = 1)+
scale_y_continuous(labels = scales::percent)+
ylab("")
same output graph
You can even add a conditional line break in the labels using mutate with gsub and negative lookahead
df %>%
gather(variable, value) %>%
mutate(label = gsub(" (?!yrs)", "\n", value, perl = T)) %>%
ggplot()+
geom_bar(aes(x = variable, y = ..count../sum(..count..)*7, fill = value), color= " black", position = "stack", show.legend = FALSE)+
geom_text(stat = "count", aes(x = variable, label = label, y = ..count../sum(..count..)*7, group = value), position = "stack", vjust = 1)+
scale_y_continuous(labels = scales::percent)+
ylab("")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.