![](/img/trans.png)
[英]Stacked barplot with percentage in R ggplot2 for categorical variables from scratch
[英]stacked barplot converting a variable into a presence absence based percentage for unrelated variables in ggplot2 R
以下是示例数据框
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"))
我的问题一开始看起来很简单,但我找不到将 dataframe 适当地编辑为 plot 条形图的方法。
对于 Var1,我想要 plot 样本中存在 var1 的次数百分比的堆叠条形图(即 var1 值 > 0)或不存在(类似于 var2 等)。
我可以通过以下方式确定这个百分比:
(1 - sum(df$Var1 == 0) / length(df$Var1)) * 100
但是如何在绘图时将其转换为百分比? 我查看了很多熔化选项,但对于这些变量没有统一的标准可以构成一个共同的 X 轴
最后,如果我想从 dataframe 的 1000 个这样的列变量中提取 plot 5 个变量,该如何回答上述问题?
编辑:感谢您到目前为止的回答! 我对问题进行了轻微的编辑,我只是在我的数据框中添加了一个变量
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"),
Disease = c("Case", "Control", "Case", "Control", "Case", "Control", "Case", "Control"))
我想弄清楚如何 plot 为 Var1PA、Var2PA 等在存在缺失情况下堆叠的案例和控件的条形图。 如果我有正确的数据框输入,ggplot2 代码将是:vars <- c('Var1PA', 'Var2PA', 'Var2PA') ##based on the first comment by @rawr tt <- data.frame(prop .table(as.table(sapply(df[, vars], table)), 2) * 100) ggplot(tt, aes(Disease, Freq)) +
geom_bar(aes(fill = Var1), position = "堆栈", stat="身份") + facet_grid(~vars)
如何获得每个变量的案例(存在和不存在)和控件(存在和不存在)的百分比? 谢谢!
这应该很好地概括。 当然,您可以对选择的变量更具选择性。
library(dplyr)
library(tidyr)
mdf = df %>% select(SampleID, ends_with("PA")) %>%
gather(key = Var, value = PA, -SampleID) %>%
mutate(PA = factor(PA, levels = c("Present", "Absent")))
ggplot(mdf, aes(x = Var, fill = PA)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent)
您可以将百分比列添加到长数据框中:
mdf %>% group_by(Var) %>%
mutate(p_present = mean(PA == "Present"),
p_absent = mean(PA == "Absent"))
# Source: local data frame [16 x 5]
# Groups: Var [2]
#
# SampleID Var PA p_present p_absent
# <dbl> <chr> <fctr> <dbl> <dbl>
# 1 1 Var1PA Present 0.625 0.375
# 2 2 Var1PA Present 0.625 0.375
# 3 3 Var1PA Present 0.625 0.375
# 4 4 Var1PA Absent 0.625 0.375
# 5 5 Var1PA Absent 0.625 0.375
# 6 6 Var1PA Absent 0.625 0.375
# 7 7 Var1PA Present 0.625 0.375
# 8 8 Var1PA Present 0.625 0.375
# 9 1 Var2PA Absent 0.500 0.500
# 10 2 Var2PA Absent 0.500 0.500
或者,如果你宁愿看到一个1线每组总结,更换mutate
与summarize
:
mdf %>% group_by(Var) %>%
summarize(p_present = mean(PA == "Present"),
p_absent = mean(PA == "Absent"))
# # A tibble: 2 × 3
# Var p_present p_absent
# <chr> <dbl> <dbl>
# 1 Var1PA 0.625 0.375
# 2 Var2PA 0.500 0.500
我的解决方案
library(ggplot2)
library(reshape)
library(dplyr)
df <- data.frame(
SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1, 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present")
)
reshape::melt(df, c('SampleID')) |>
filter(variable == 'Var1' | variable == 'Var2') |>
mutate(value1 = ifelse(value == 0, 'Absent', 'Present')) |>
group_by(variable) |> count(variable, value1) |>
mutate(
prc = n/sum(n)
) |> as.data.frame() |>
ggplot( aes(x = variable, y = prc, fill = value1)) +
geom_bar(stat = 'identity', position = 'fill', width = 0.7) +
scale_y_continuous(labels = scales::percent) +
labs(fill = 'Presence status') +
geom_text(aes(x = variable, y = prc, label = stat(y)),
position = position_fill(vjust = 0.5))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.