繁体   English   中英

根据组对直方图进行着色时,防止出现错误的密度图

[英]Preventing wrong density plots when coloring histograms according to groups

基于一些虚拟数据,我创建了一个带有desity图的直方图

set.seed(1234)
wdata = data.frame(
  sex = factor(rep(c("F", "M"), each=200)),
  weight = c(rnorm(200, 55), rnorm(200, 58))
)
a <- ggplot(wdata, aes(x = weight))

a + geom_histogram(aes(y = ..density..,
                       # color = sex
                       ), 
                   colour="black",
                   fill="white",
                   position = "identity") +
  geom_density(alpha = 0.2,
               # aes(color = sex)
               ) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF"))

基本结果

weight的直方图应根据sex着色,因此我使用aes(y = ..density.., color = sex)作为geom_histogram()

a + geom_histogram(aes(y = ..density..,
                       color = sex
                       ), 
                   colour="black",
                   fill="white",
                   position = "identity") +
  geom_density(alpha = 0.2,
               # aes(color = sex)
               ) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF"))

缩放的单个直方图(不需要)

正如我所希望的那样,密度图保持不变(两个组的总体情况),但直方图会向上扩展(现在似乎单独处理):

我该如何防止这种情况发生? 我需要单独的彩色直方图条,但需要所有着色组的联合密度图。

PS对geom_density()使用aes(color = sex)将所有内容恢复到原始比例 - 但我不想要单独的密度图(如下所示):

a + geom_histogram(aes(y = ..density..,
                       color = sex
                       ), 
                   colour="black",
                   fill="white",
                   position = "identity") +
  geom_density(alpha = 0.2,
               aes(color = sex)
               ) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF"))

个别密度(不需要)

编辑:

正如已经建议的那样,将geom_histogram()的美学中的组数除以y = ..density../2可以近似解。 然而,这仅适用于对称分布,如下面的第一个输出:

a + geom_histogram(aes(y = ..density../2,
                       color = sex
                       ), 
                   colour="black",
                   fill="white",
                   position = "identity") +
  geom_density(alpha = 0.2,
               ) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF"))

产量

解

然而,较不对称的分布可能会导致使用此方法的麻烦。 参见下面的那些,其中对于5组,使用y = ..density../5 第一个原始,然后操纵(与position = "stack" ): 原版的

除以5

由于左侧分布较重,左侧除以5低估,右侧高估过高估计。

编辑2:解决方案

正如安德鲁所建议的,以下(完整)代码解决了这个问题:

library(ggplot2)
set.seed(1234)
wdata = data.frame(
  sex = factor(rep(c("F", "M"), each = 200)),
  weight = c(rnorm(200, 55), rnorm(200, 58))
)

binwidth <- 0.25
a <- ggplot(wdata,
            aes(x = weight,
                # Pass binwidth to aes() so it will be found in
                # geom_histogram()'s aes() later
                binwidth = binwidth))

# Basic plot w/o colouring according to 'sex'
a + geom_histogram(aes(y = ..density..),
                   binwidth = binwidth,
                   colour = "black",
                   fill = "white",
                   position = "stack") +
  geom_density(alpha = 0.2) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
  # Use fixed scale for sake of comparability
  scale_x_continuous(limits = c(52, 61)) +
  scale_y_continuous(limits = c(0, 0.25))


# Plot w/ colouring according to 'sex'
a + geom_histogram(aes(x = weight,
                       # binwidth will only be found if passed to
                       # ggplot()'s aes() (as above)
                       y = ..count.. / (sum(..count..) * binwidth),
                       color = sex),
                   binwidth = binwidth,
                   fill="white",
                   position = "stack") +
  geom_density(alpha = 0.2) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
  # Use fixed scale for sake of comparability
  scale_x_continuous(limits = c(52, 61)) +
  scale_y_continuous(limits = c(0, 0.25)) +
  guides(color = FALSE)

注意: binwidth = binwidth需要被传递到ggplot()aes()否则预先指定binwidth不会被找到geom_histogram() s'的aes() 此外,指定position = "stack" ,以使直方图的两个版本具有可比性。 虚拟数据的图表和下面更复杂的分布:

正确,无组织,简单的数据

正确,分组,简单的数据

正确,无组织,更复杂的分布

正确,分组,更复杂的分布

解决了 - 谢谢你的帮助!

我不认为你可以使用y=..density.. ,但你可以重新创建像这样的东西......

binwidth <- 0.25 #easiest to set this manually so that you know what it is

a + geom_histogram(aes(y = ..count.. / (sum(..count..) * binwidth),
                       color = sex), 
                   binwidth = binwidth,
                   fill="white",
                   position = "identity") +
    geom_density(alpha = 0.2) +
    scale_color_manual(values = c("#868686FF", "#EFC000FF"))

在此输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM