[英]Preventing wrong density plots when coloring histograms according to groups
基于一些虚拟数据,我创建了一个带有desity图的直方图
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
a <- ggplot(wdata, aes(x = weight))
a + geom_histogram(aes(y = ..density..,
# color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
weight
的直方图应根据sex
着色,因此我使用aes(y = ..density.., color = sex)
作为geom_histogram()
:
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
正如我所希望的那样,密度图保持不变(两个组的总体情况),但直方图会向上扩展(现在似乎单独处理):
我该如何防止这种情况发生? 我需要单独的彩色直方图条,但需要所有着色组的联合密度图。
PS对geom_density()
使用aes(color = sex)
将所有内容恢复到原始比例 - 但我不想要单独的密度图(如下所示):
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
编辑:
正如已经建议的那样,将geom_histogram()
的美学中的组数除以y = ..density../2
可以近似解。 然而,这仅适用于对称分布,如下面的第一个输出:
a + geom_histogram(aes(y = ..density../2,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
产量
然而,较不对称的分布可能会导致使用此方法的麻烦。 参见下面的那些,其中对于5组,使用y = ..density../5
。 第一个原始,然后操纵(与position = "stack"
):
由于左侧分布较重,左侧除以5低估,右侧高估过高估计。
编辑2:解决方案
正如安德鲁所建议的,以下(完整)代码解决了这个问题:
library(ggplot2)
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each = 200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
binwidth <- 0.25
a <- ggplot(wdata,
aes(x = weight,
# Pass binwidth to aes() so it will be found in
# geom_histogram()'s aes() later
binwidth = binwidth))
# Basic plot w/o colouring according to 'sex'
a + geom_histogram(aes(y = ..density..),
binwidth = binwidth,
colour = "black",
fill = "white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25))
# Plot w/ colouring according to 'sex'
a + geom_histogram(aes(x = weight,
# binwidth will only be found if passed to
# ggplot()'s aes() (as above)
y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25)) +
guides(color = FALSE)
注意: binwidth = binwidth
需要被传递到ggplot()
的aes()
否则预先指定binwidth
不会被找到geom_histogram()
s'的aes()
此外,指定position = "stack"
,以使直方图的两个版本具有可比性。 虚拟数据的图表和下面更复杂的分布:
解决了 - 谢谢你的帮助!
我不认为你可以使用y=..density..
,但你可以重新创建像这样的东西......
binwidth <- 0.25 #easiest to set this manually so that you know what it is
a + geom_histogram(aes(y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "identity") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.