Following is a sample data frame
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"))
My question started off as seemingly simple, but I could not find a way to edit the dataframe suitably to plot a barplot.
For Var1, I want to plot a stacked barplot of the percent of times var1 was present in the sample (ie var1 value > 0) or absent (Similarly for var2 and so on).
I could determine this percentage by:
(1 - sum(df$Var1 == 0) / length(df$Var1)) * 100
But how do I convert this into a percentage while plotting? I looked at many melt options, but there is no unifying criteria for these variables that would make a common X axis
Finally, how does one answer the question above if I want to plot 5 variables from a dataframe of 1000 such column variables?
Edit: Thanks for the answers so far! I have a slight edit to the question I just added one more variable to my data frame
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"),
Disease = c("Case", "Control", "Case", "Control", "Case", "Control", "Case", "Control"))
I am trying to figure out how to plot the barplot for cases and controls with presence absence stacked within them for Var1PA, Var2PA and so on. If I have the right data frame input, the ggplot2 code would be: vars <- c('Var1PA', 'Var2PA', 'Var2PA') ##based on the first comment by @rawr tt <- data.frame(prop.table(as.table(sapply(df[, vars], table)), 2) * 100) ggplot(tt, aes(Disease, Freq)) +
geom_bar(aes(fill = Var1), position = "stack", stat="identity") + facet_grid(~vars)
How do I get percentages for cases (present and absent) and controls (present and absent) for each of the vars? Thanks!
This should generalize nicely. You can, of course, be more selective about the variables you pick.
library(dplyr)
library(tidyr)
mdf = df %>% select(SampleID, ends_with("PA")) %>%
gather(key = Var, value = PA, -SampleID) %>%
mutate(PA = factor(PA, levels = c("Present", "Absent")))
ggplot(mdf, aes(x = Var, fill = PA)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent)
You can add the percentage columns to the long data frame:
mdf %>% group_by(Var) %>%
mutate(p_present = mean(PA == "Present"),
p_absent = mean(PA == "Absent"))
# Source: local data frame [16 x 5]
# Groups: Var [2]
#
# SampleID Var PA p_present p_absent
# <dbl> <chr> <fctr> <dbl> <dbl>
# 1 1 Var1PA Present 0.625 0.375
# 2 2 Var1PA Present 0.625 0.375
# 3 3 Var1PA Present 0.625 0.375
# 4 4 Var1PA Absent 0.625 0.375
# 5 5 Var1PA Absent 0.625 0.375
# 6 6 Var1PA Absent 0.625 0.375
# 7 7 Var1PA Present 0.625 0.375
# 8 8 Var1PA Present 0.625 0.375
# 9 1 Var2PA Absent 0.500 0.500
# 10 2 Var2PA Absent 0.500 0.500
Or if you'd rather see a 1-line-per-group summary, replace mutate
with summarize
:
mdf %>% group_by(Var) %>%
summarize(p_present = mean(PA == "Present"),
p_absent = mean(PA == "Absent"))
# # A tibble: 2 × 3
# Var p_present p_absent
# <chr> <dbl> <dbl>
# 1 Var1PA 0.625 0.375
# 2 Var2PA 0.500 0.500
My solution for this
library(ggplot2)
library(reshape)
library(dplyr)
df <- data.frame(
SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1, 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present")
)
reshape::melt(df, c('SampleID')) |>
filter(variable == 'Var1' | variable == 'Var2') |>
mutate(value1 = ifelse(value == 0, 'Absent', 'Present')) |>
group_by(variable) |> count(variable, value1) |>
mutate(
prc = n/sum(n)
) |> as.data.frame() |>
ggplot( aes(x = variable, y = prc, fill = value1)) +
geom_bar(stat = 'identity', position = 'fill', width = 0.7) +
scale_y_continuous(labels = scales::percent) +
labs(fill = 'Presence status') +
geom_text(aes(x = variable, y = prc, label = stat(y)),
position = position_fill(vjust = 0.5))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.