简体   繁体   中英

R ggplot2: complex stacked barchart with multiple categorical variables

My dataset in R looks like the following:

a <- c("M","F","F","F","M","M","F","F","F","M","F","F","M","M","F")
p <- c("P","P","W","W","P","P","W","W","W","W","P","P","P","W","W")
y1 <- c("yes","yes","null","no","no","no","yes","null","no","yes","yes","yes","null","no","no")
y2 <- c("yes","null","no","no","no","yes","yes","yes","null","no","yes","null","no","yes","yes")
y3 <- c("no","no","no","yes","null","yes","null","no","no","no","yes","yes","null","no","no")
VE <- data.frame(gender = a,
             type = p,
             y1 = y1,
             y2 = y2,
             y3 = y3)

And I would like to create a bar chart which looks like this: ideal bar chart

I just figured out a long way to get the chart:

q<-data.frame(gender=VE$gender,
          year=rep("y1",15),
          group=VE$y1)
p<-data.frame(gender=VE$gender,
          year=rep("y2",15),
          group=VE$y2)
x<-data.frame(gender=VE$gender,
          year=rep("y3",15),
          group=VE$y3)
Table<-rbind(q,p,x)
ggplot(Table, aes(year)) + geom_bar(aes(fill=group), position = "stack") + facet_grid(gender~.)

Is there any better way to get the bar chart? (since I was originally going to deal with 3,000,000 obsevations which have 32 variables each) Please give me some kind help with this bar chart. Cheers!

First you can melt your data.frame to get a 'long' format. For this I have created an ID variable, the 3 variables 'y1, 'y2', and 'y3' are put together into one variable. You can then use ggplot2 and use geom_bar() which will count the values in the x aesthetic if no y aesthetic is provided.

library(ggplot2)

# create data frame
df <- data.frame(ID = 1:15, 
             gender = c('M', 'F', 'F', 'F', 'M', 'M', 'F', 'F', 'F', 'M', 'F', 'F', 'M', 'M', 'F'),
             type = toupper(c('p', 'p', 'w', 'w', 'p', 'p', 'w', 'w', 'w', 'w', 'p', 'p', 'p', 'W', 'W')),
             y1 = c('yes', 'yes', 'null', 'no', 'no', 'no', 'yes', 'null', 'no', 'yes', 'yes', 'yes', 'null', 'no', 'no'),
             y2 = c('yes', 'null', 'no', 'no', 'no', 'yes', 'yes', 'yes', 'null', 'no', 'yes', 'null', 'no', 'yes', 'yes'),
             y3 = c('no', 'no', 'no', 'yes', 'null', 'yes', 'null', 'no', 'no', 'no', 'yes', 'yes', 'null', 'no', 'no'),
             stringsAsFactors = TRUE)

# melt data frame to long format
df_melt <- data.table::melt(df[, c(1, 4:6)], id.vars = "ID")

# set correct levels for factor (needed for the legend)
df_melt$value <- factor(df_melt$value, levels = c("yes", "no", "null"))

# add ggplot
ggplot(data = df_melt) + 
  geom_bar(aes(x = variable, fill = value, colour = value)) +
  ylab("count") +
  xlab("year")

Which returns:

输出_ggplot

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM