简体   繁体   English

Kernel Density Estimate(概率密度函数)错了?

[英]Kernel Density Estimate (Probability Density Function) is wrong?

I've created a histogram to show the density of the age at which serial killers first killed and have tried to superimpose a probability density function on this.我创建了一个直方图来显示连环杀手第一次被杀的年龄密度,并试图在其上叠加概率密度 function。 However, when I use the geom_density() function in ggplot2, I get a density function that looks far too small (area<1).但是,当我在 ggplot2 中使用 geom_density() function 时,我得到的密度 function 看起来太小了(area<1)。 What is strange is that by changing the bin width of the histogram, the density function also changes (the smaller the bin width, the seemingly better fitting the density function. I was wondering if anyone had some guidance to make this function fit better and its area is so far below 1?奇怪的是,通过改变直方图的 bin 宽度,密度 function 也会发生变化(bin 宽度越小,似乎越适合密度 function。我想知道是否有人有一些指导可以让这个 ZC1C425268E68384 更适合面积远低于1?

    #Histograms for Age of First Kill: 
library(ggplot2)
AFKH <- ggplot(df, aes(AgeFirstKill,fill = cut(AgeFirstKill, 100))) +
  geom_histogram(aes(y=..count../sum(..count..)), show.legend = FALSE, binwidth = 3) + # density wasn't working, so had to use the ..count/../sum(..count..)
  scale_fill_discrete(h = c(200, 10), c = 100, l = 60) + # c =, for color, and l = for brightness, the #h = c() changes the color gradient
  theme(axis.title=element_text(size=22,face="bold"), 
        plot.title = element_text(size=30, face = "bold"),
        axis.text.x = element_text(face="bold", size=14),
        axis.text.y = element_text(face="bold", size=14)) +
  labs(title = "Age of First kill",x = "Age of First Kill", y = "Density")+
  geom_density(aes(AgeFirstKill, y = ..density..), alpha = 0.7, fill = "white",lwd =1, stat = "density")
AFKH

仓宽 = 1

绑定宽度 =3

We don't have your data set, so let's make one that's reasonably close to it:我们没有您的数据集,所以让我们制作一个相当接近它的数据集:

set.seed(3)
df <- data.frame(AgeFirstKill = rgamma(100, 3, 0.2) + 10)

The first thing to notice is that the density curve doesn't change .首先要注意的是密度曲线没有改变 Look carefully at the y axis on your plot.仔细查看 plot 上的 y 轴。 You will notice that the peak of the density curve doesn't change, but remains at about 0.06.您会注意到密度曲线的峰值没有变化,但仍保持在 0.06 左右。 It's the height of the histogram bars that change, and the y axis changes accordingly.直方图条的高度发生变化,y 轴也相应变化。

The reason for this is that you aren't dividing the height of the histogram bars by their width to preserve their area.这样做的原因是您没有将直方图条的高度除以它们的宽度来保留它们的区域。 Your y aesthetic should be ..count../sum(..count..)/binwidth to keep this constant.你的 y 审美应该是..count../sum(..count..)/binwidth以保持这个不变。

To show this, let's wrap your plotting code in a function that allows you to specify the bin width but also takes the binwidth into account when plotting:为了展示这一点,让我们将您的绘图代码包装在 function 中,它允许您指定 bin 宽度,但在绘图时也考虑 binwidth:

draw_it <- function(bw) {
  ggplot(df, aes(AgeFirstKill,fill = cut(AgeFirstKill, 100))) +
  geom_histogram(aes(y=..count../sum(..count..)/bw), show.legend = FALSE, 
                 binwidth = bw) +
  scale_fill_discrete(h = c(200, 10), c = 100, l = 60) + 
  theme(axis.title=element_text(size=22,face="bold"), 
        plot.title = element_text(size=30, face = "bold"),
        axis.text.x = element_text(face="bold", size=14),
        axis.text.y = element_text(face="bold", size=14)) +
  labs(title = "Age of First kill",x = "Age of First Kill", y = "Density") +
  geom_density(aes(AgeFirstKill, y = ..density..), alpha = 0.7, 
               fill = "white",lwd =1, stat = "density")
}

And now we can do:现在我们可以这样做:

draw_it(bw = 1)

在此处输入图像描述

draw_it(bw = 3)

在此处输入图像描述

draw_it(bw = 7)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM