简体   繁体   English

如何将直方图和密度 plot 与 Y 轴上的数字而不是密度重叠

[英]how to overlap histogram and density plot with Numbers on Y-axis instead of density

I have the histogram plot created in ggplot2 and I'd like to overlap it with density line for the same data.我有在 ggplot2 中创建的直方图 plot ,我想将它与相同数据的密度线重叠。 Importantly, I don't want to turn histogram into density values, but want to keep N (numbers) on y axis.重要的是,我不想将直方图转换为密度值,而是想在 y 轴上保留 N(数字)。 Is there any way to overlap the histogram and density plot without transforming the histogram, but rather to scale up the density curve?有没有办法在不变换直方图的情况下重叠直方图和密度 plot ,而是放大密度曲线?

The histogram for this data:此数据的直方图:

图像1

The initial density plot for the same data:相同数据的初始密度 plot:

img2

The desired overlay but with density on Y-axis instead of counts:所需的叠加,但在 Y 轴上具有密度而不是计数:

图像3

You'll want to use the ..count.. parameter created by stat_density , and then scale it by the bin width.您需要使用 stat_density 创建的stat_density ..count..参数,然后按 bin 宽度对其进行缩放。

library(ggplot2)
set.seed(15)
df <- data.frame(x=rnorm(500, sd=10))
ggplot(df, aes(x=x)) + 
  geom_histogram(colour="black", fill="white", binwidth = 5 ) +
  geom_density(aes(y=..count..*5), alpha=.2, fill="#FF6666") 

在此处输入图像描述

Yes, but you have to choose the right scale factor.是的,但您必须选择正确的比例因子。 Since you do not provide any data, I will illustrate with the built-in iris data.由于您没有提供任何数据,因此我将使用内置的虹膜数据进行说明。

H = hist(iris$Sepal.Width, main="")

基础直方图

Since the heights are the frequency counts, the sum of the heights should equal the number of points which is nrow(iris).由于高度是频率计数,因此高度之和应等于 nrow(iris) 的点数。 The area under the curve (boxes) is the sum of the heights times the width of the boxes, so曲线(框)下的面积是高度的总和乘以框的宽度,所以

  Area = nrow(iris) * (H$breaks[2] - H$breaks[1])

In this case, it is 150 * 0.2 = 30, but better to keep it as a formula.在这种情况下,它是 150 * 0.2 = 30,但最好将其保留为公式。

Now the area under the standard density curve is one, so the scale factor that we want to use is nrow(iris) * (H$breaks[2] - H$breaks[1]) to make the areas the same.现在标准密度曲线下的面积是 1,所以我们要使用的比例因子是nrow(iris) * (H$breaks[2] - H$breaks[1])以使面积相同。 Where do you apply the scale factor?你在哪里应用比例因子?

DENS = density(iris$Sepal.Width)
str(DENS)
List of 7
 $ x        : num [1:512] 1.63 1.64 1.64 1.65 1.65 ...
 $ y        : num [1:512] 0.000244 0.000283 0.000329 0.000379 0.000436 ...
 $ bw       : num 0.123
 $ n        : int 150
 $ call     : language density.default(x = iris$Sepal.Width)
 $ data.name: chr "iris$Sepal.Width"
 $ has.na   : logi FALSE

We want to scale the y values for the density plot, so we use:我们想要缩放密度 plot 的 y 值,所以我们使用:

DENS$y = DENS$y * nrow(iris) * (H$breaks[2] - H$breaks[1])

and add the line to the histogram并将线添加到直方图中

lines(DENS)

带密度曲线的直方图

You can make this a bit nicer by adjusting the bandwidth for the density calculation您可以通过调整密度计算的带宽来使它更好一点

H = hist(iris$Sepal.Width, main="")
DENS = density(iris$Sepal.Width, adjust=0.7)
DENS$y = DENS$y * nrow(iris) * (H$breaks[2] - H$breaks[1])
lines(DENS)

调整后密度曲线的直方图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM