ggplot2 R 中不相关变量的堆叠条形图将变量转换为基于存在不存在的百分比

Question

Following is a sample data frame以下是示例数据框

df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
                 Var1 = c(0.1 , 0.5,    0.7,    0,  0,  0,  0.5,    0.2), 
                 Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent",  "Present", "Present"), 
                 Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2), 
                 Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"))

My question started off as seemingly simple, but I could not find a way to edit the dataframe suitably to plot a barplot.我的问题一开始看起来很简单，但我找不到将 dataframe 适当地编辑为 plot 条形图的方法。

For Var1, I want to plot a stacked barplot of the percent of times var1 was present in the sample (ie var1 value > 0) or absent (Similarly for var2 and so on).对于 Var1，我想要 plot 样本中存在 var1 的次数百分比的堆叠条形图（即 var1 值 > 0）或不存在（类似于 var2 等）。

I could determine this percentage by:我可以通过以下方式确定这个百分比：

(1 - sum(df$Var1 == 0) / length(df$Var1)) * 100

But how do I convert this into a percentage while plotting?但是如何在绘图时将其转换为百分比？ I looked at many melt options, but there is no unifying criteria for these variables that would make a common X axis我查看了很多熔化选项，但对于这些变量没有统一的标准可以构成一个共同的 X 轴

Finally, how does one answer the question above if I want to plot 5 variables from a dataframe of 1000 such column variables?最后，如果我想从 dataframe 的 1000 个这样的列变量中提取 plot 5 个变量，该如何回答上述问题？

Edit: Thanks for the answers so far!编辑：感谢您到目前为止的回答！ I have a slight edit to the question I just added one more variable to my data frame我对问题进行了轻微的编辑，我只是在我的数据框中添加了一个变量

df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
             Var1 = c(0.1 , 0.5,    0.7,    0,  0,  0,  0.5,    0.2), 
             Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent",  "Present", "Present"), 
             Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2), 
             Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"),
             Disease = c("Case", "Control", "Case", "Control", "Case", "Control", "Case", "Control"))

I am trying to figure out how to plot the barplot for cases and controls with presence absence stacked within them for Var1PA, Var2PA and so on.我想弄清楚如何 plot 为 Var1PA、Var2PA 等在存在缺失情况下堆叠的案例和控件的条形图。 If I have the right data frame input, the ggplot2 code would be: vars <- c('Var1PA', 'Var2PA', 'Var2PA') ##based on the first comment by @rawr tt <- data.frame(prop.table(as.table(sapply(df[, vars], table)), 2) * 100) ggplot(tt, aes(Disease, Freq)) +如果我有正确的数据框输入，ggplot2 代码将是：vars <- c('Var1PA', 'Var2PA', 'Var2PA') ##based on the first comment by @rawr tt <- data.frame(prop .table(as.table(sapply(df[, vars], table)), 2) * 100) ggplot(tt, aes(Disease, Freq)) +
geom_bar(aes(fill = Var1), position = "stack", stat="identity") + facet_grid(~vars) geom_bar(aes(fill = Var1), position = "堆栈", stat="身份") + facet_grid(~vars)

How do I get percentages for cases (present and absent) and controls (present and absent) for each of the vars?如何获得每个变量的案例（存在和不存在）和控件（存在和不存在）的百分比？ Thanks!谢谢！

Answer 1

This should generalize nicely. 这应该很好地概括。 You can, of course, be more selective about the variables you pick. 当然，您可以对选择的变量更具选择性。

library(dplyr)
library(tidyr)
mdf = df %>% select(SampleID, ends_with("PA")) %>%
    gather(key = Var, value = PA, -SampleID) %>%
    mutate(PA = factor(PA, levels = c("Present", "Absent")))

ggplot(mdf, aes(x = Var, fill = PA)) +
    geom_bar(position = "fill") +
    scale_y_continuous(labels = scales::percent)

You can add the percentage columns to the long data frame: 您可以将百分比列添加到长数据框中：

mdf %>% group_by(Var) %>%
    mutate(p_present = mean(PA == "Present"),
           p_absent = mean(PA == "Absent"))
# Source: local data frame [16 x 5]
# Groups: Var [2]
# 
#    SampleID    Var      PA p_present p_absent
#       <dbl>  <chr>  <fctr>     <dbl>    <dbl>
# 1         1 Var1PA Present     0.625    0.375
# 2         2 Var1PA Present     0.625    0.375
# 3         3 Var1PA Present     0.625    0.375
# 4         4 Var1PA  Absent     0.625    0.375
# 5         5 Var1PA  Absent     0.625    0.375
# 6         6 Var1PA  Absent     0.625    0.375
# 7         7 Var1PA Present     0.625    0.375
# 8         8 Var1PA Present     0.625    0.375
# 9         1 Var2PA  Absent     0.500    0.500
# 10        2 Var2PA  Absent     0.500    0.500

Or if you'd rather see a 1-line-per-group summary, replace mutate with summarize : 或者，如果你宁愿看到一个1线每组总结，更换mutate与summarize ：

mdf %>% group_by(Var) %>%
    summarize(p_present = mean(PA == "Present"),
           p_absent = mean(PA == "Absent"))
# # A tibble: 2 × 3
#      Var p_present p_absent
#    <chr>     <dbl>    <dbl>
# 1 Var1PA     0.625    0.375
# 2 Var2PA     0.500    0.500

Answer 2

My solution for this我的解决方案

library(ggplot2)
library(reshape)
library(dplyr)

df <- data.frame(
  SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
  Var1 = c(0.1, 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
  Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
  Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
  Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present")
)

reshape::melt(df, c('SampleID')) |> 
  filter(variable == 'Var1' | variable == 'Var2') |> 
  mutate(value1 = ifelse(value == 0, 'Absent', 'Present')) |> 
  group_by(variable) |> count(variable, value1) |> 
  mutate(
    prc = n/sum(n)
  ) |>  as.data.frame() |> 
  ggplot( aes(x = variable, y = prc, fill = value1)) +
    geom_bar(stat = 'identity', position = 'fill', width = 0.7) +
    scale_y_continuous(labels = scales::percent) +
    labs(fill = 'Presence status') +
    geom_text(aes(x = variable, y = prc, label = stat(y)),
              position = position_fill(vjust = 0.5))

ggplot2 R 中不相关变量的堆叠条形图将变量转换为基于存在不存在的百分比

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-11-12 04:48:16

解决方案2
0 2022-03-23 23:36:54

ggplot2 R 中不相关变量的堆叠条形图将变量转换为基于存在不存在的百分比

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-11-12 04:48:16

解决方案2 0 2022-03-23 23:36:54

解决方案1
2 已采纳 2016-11-12 04:48:16

解决方案2
0 2022-03-23 23:36:54