简体   繁体   中英

Plotting multiple binary variables on the same plot in ggplot

I am hoping to use ggplot to construct a barplot of frequencies (or just % 1s) of a bunch of binary variables, and am having trouble getting them all together on one plot.

The variables all stem from the same question in a survey, so ideally it'd be nice to have data that is tidy with one column for this variable, but respondents could select more than one option and I'm hoping to retain that instead of having a "more than one selected" option. Here is a slice of the data:

structure(list(gender = structure(c("Male", "Male", "Female", 
"Female", "Female", "Female", "Male", "Male", "Male", "Male"), label = "Q4", format.stata = "%24s"), 
    var1 = structure(c("0", "0", "1", "1", "0", "0", "0", "0", 
    "0", "0"), format.stata = "%9s"), var2 = structure(c("0", 
    "98", "1", "0", "0", "0", "0", "0", "0", "0"), format.stata = "%9s"), 
    var3 = structure(c("0", "0", "0", "0", "0", "0", "0", "0", 
    "0", "0"), format.stata = "%9s"), var4 = structure(c("1", 
    "0", "1", "0", "0", "0", "1", "1", "0", "0"), format.stata = "%9s"), 
    var5 = structure(c("1", "0", "0", "0", "0", "1", "0", "0", 
    "0", "0"), format.stata = "%9s")), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))

Get the data in long format so that it is easier to plot.

library(tidyverse)

df %>%
  pivot_longer(cols = starts_with('var')) %>%
  group_by(name) %>%
  summarise(frequency_of_1 = sum(value == 1)) %>%
  #If you need percentage use mean instead of sum
  #summarise(frequency_of_1 = mean(value == 1)) %>%
  ggplot() + aes(name, frequency_of_1) + geom_col()

在此处输入图像描述


In base R you can do this with colSums and barplot .

barplot(colSums(df[-1] == 1))
#For percentage
#barplot(colMeans(df[-1] == 1))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM