[英]How to add 95% confidence intervals to graph of proportions of factor levels in ggplot?
我想建立在我對之前提出的問題得到的很好答案的基礎上:
我希望以代碼為基礎:
var1 <- c("Left", "Right", NA, "Left", "Right", "Right", "Right", "Left", "Left", "Right", "Left", "Left","Left", "Right", "Left", "Right", "Right", "Right", "Left", "Left", "Right", NA, "Left", "Left","Left", "Right", NA, "Left", "Right", "Right", "Right", "Left", "Left", "Right", "Left", "Left","Left", "Right", "Left", "Right", "Right", "Right", "Left", "Left", "Right", NA, "Left", "Left")
var2 <- c("Higher", "Lower", NA, "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", "Slightly higher","Higher", "Lower", "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", NA, "Slightly lower","Higher", "Lower", NA, "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", "Slightly higher","Higher", "Lower", "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly lower", "Higher", "Higher", "Higher", NA, "Slightly lower")
df <- as.data.frame(cbind(var1, var2))
library(dplyr)
library(ggplot2)
df %>%
na.omit() %>%
group_by(var1, var2) %>%
summarise(n = n()) %>%
mutate(n = n/sum(n)) %>%
ungroup() %>%
ggplot() + aes(var2, n, fill = var1) +
geom_bar(position = "dodge", stat = "identity") +
labs(x="Left or Right",y="Count")+
scale_y_continuous() +
scale_fill_discrete(name = "Answer:")+ theme_classic()+
theme(legend.position="top") +
scale_fill_manual(values = c("black", "red"))
以 95% 置信區間的形式向圖表上的每個條添加誤差條。 我試圖在術語中添加
upperE=(1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n), lowerE=(-1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n).
但是,唉,我不斷收到錯誤...
我還嘗試為圖表制作一個全新的 dataframe,因此:
var1 <- c("Left", "Right", NA, "Left", "Right", "Right", "Right", "Left", "Left", "Right", "Left", "Left","Left", "Right", "Left", "Right", "Right", "Right", "Left", "Left", "Right", NA, "Left", "Left","Left", "Right", NA, "Left", "Right", "Right", "Right", "Left", "Left", "Right", "Left", "Left","Left", "Right", "Left", "Right", "Right", "Right", "Left", "Left", "Right", NA, "Left", "Left")
var2 <- c("Higher", "Lower", NA, "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", "Slightly higher","Higher", "Lower", "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", NA, "Slightly lower","Higher", "Lower", NA, "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", "Slightly higher","Higher", "Lower", "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly lower", "Higher", "Higher", "Higher", NA, "Slightly lower")
df <- as.data.frame(cbind(var1, var2))
dat <- df %>%
na.omit() %>%
group_by(var1, var2) %>%
summarise(n = n()) %>%
mutate(prop = n/sum(n),upperE=1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n, lowerE=-1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n)
test <- ggplot(dat, aes(x=var2, y = prop, fill = var1))+
geom_bar(position = "dodge", stat = "identity") + geom_errorbar(aes(ymin = lowerE, ymax = upperE),position="dodge")+
labs(x="Answer",y="Proportion")+
scale_fill_discrete(name = "Condition:")+ theme_classic()+
theme(legend.position="top")
這給了我錯誤條,但在 Y 軸上位於 0,而不是在每個條的頂部......
有沒有人有什么建議? 謝謝!
我現在已經弄清楚了如何讓誤差條位於每個條上適當的 position - 我需要將誤差條的 ymin 和 ymax 規范與正在繪制的值相關聯,因此:
dat <- df %>%
na.omit() %>%
group_by(var1, var2) %>%
summarise(n = n()) %>%
mutate(prop = n/sum(n),upperE=1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n, lowerE=-1.96*sqrt(n/sum(n))*(1-(n/sum(n)))/n)
test <- ggplot(dat, aes(x=var2, y = prop, fill = var1))+
geom_bar(position = "dodge", stat = "identity") + geom_errorbar(aes(ymin = prop+lowerE, ymax = prop+upperE),width = .2, position=position_dodge(.9))+
labs(x="Answer",y="Proportion")+
scale_fill_discrete(name = "Condition:")+ theme_classic()+
theme(legend.position="top")
這給了:
95%CI 中 SE 的比例公式為: se = sqrt((p * (1-p))/n
。所以我認為在上面的解決方案中說明了: sqrt(n/sum(n) * 1-(n/sum(n))/n)
。但是, n
只有成功的計數。完整的樣本是sum(n)
。所以它實際上應該是sqrt(n/sum(n) * (1-(n/sum(n))/**sum**(n))
。
超級舊線程,但以防萬一有人仍然偶然發現這一點:已投票答案中的置信區間公式不正確。
它應該是:
mutate(prop = n/sum(n),
upperE=1.96*sqrt(n/sum(n)*(1-(n/sum(n)))/sum(n)),
lowerE=-1.96*sqrt(n/sum(n)*(1-(n/sum(n)))/sum(n)))
. 使用用於置信區間的公式,您只需對公式的第一位求平方根。 但是,您需要對整個公式取平方根(Z 分數除外)。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.