简体   繁体   English

与按刮擦计算相比,density()内核估计量的差异

[英]Discrepancies in the density() kernel estimator compared to calculations by scratch

I am trying to calculate the Gaussian kernel density, and to test of my knowledge of the density() function, I decided to calculate it from scratch and compare the two results. 我试图计算高斯核密度,并测试我对density()函数的了解,因此我决定从头开始计算它,并比较两个结果。

However, they do not provide the same answer. 但是,它们没有提供相同的答案。

I start with an existing dataset 我从一个现有的数据集开始

xi <- mtcars$mpg

and can plot the kernel density of this data, as follows 并可以绘制此数据的内核密度,如下所示

plot(density(xi, kernel = "gaussian"))

which provides this... 提供...

自动高斯核密度

I then grab some of the details from this calculation, so that my calculation is consistent. 然后,我从此计算中获取一些细节,以便我的计算是一致的。

auto.dens <- density(xi, kernel = "gaussian")
h <- auto.dens$bw # bandwidth for kernel
x0 <- auto.dens$x # points for prediction

I then calculate the gaussian kernel density myself, and I have done this in a loop, just so it is clearer to read. 然后,我自己计算高斯内核密度,并且我已经循环执行了此操作,因此更易于阅读。

fx0 <- NULL

for (j in 1:length(x0)){

    t <- abs(x0[j]-xi)/h

    K <- (1/sqrt(2*pi))*exp(-(t^2)/2)

    fx0 <- c(fx0,sum(K*t)/(length(t)*h))
}

The basic calculation has been constructed following the details in section 3.3.6 in Statistical Methods in the Atmospheric Sciences, 3rd Edition, by Daniel Wilks. 基本计算是根据Daniel Wilks在《大气科学中的统计方法》(第三版)中第3.3.6节中的详细信息构建的。 威尔克斯教科书中的公式3.13 with the Gaussian kernel set as 高斯内核设置为 在此处输入图片说明 and t being 而且是 在此处输入图片说明

However, and here is my problem. 但是,这是我的问题。

I then plot the two together... 然后我将两者绘制在一起...

plot(y=fx0,x=x0, type="l", ylim=c(0,0.07))
lines(x=auto.dens$x, y=auto.dens$y, col="red")

The output from the density function (red), and my calculations (black), I get 密度函数(红色)的输出和我的计算结果(黑色),我得到 在此处输入图片说明

!These two calculations are clearly different! 这两个计算显然不同!

Have I miss understood how the density function works? 我想念密度函数的工作原理吗? Why can't I manage to calcualte the same results from scratch? 为什么我不能从头开始计算相同的结果? Why is my kernel estimator providing different results? 为什么我的内核估计器提供不同的结果? Why are my results less smooth? 为什么我的结果不太流畅?

I need to construct and apply a kernel smoother (not just of density) to a much more complicated dataset, and only did this little example to make sure I was doing the same as the the automated functions, and really wasn't expecting the have this problem. 我需要构造一个更平滑的内核(而不只是密度)并将其应用于更复杂的数据集,并且只做了这个小例子以确保我做的与自动化功能相同,并且确实没想到这个问题。 I've tried all kinds of things, and just cannot see why I get a different result. 我已经尝试过各种方法,但看不到为什么我得到不同的结果。

Thank you all in advance, for reading and any comments, little or big. 预先感谢大家阅读或发表任何评论,无论大小。

Edit: 13:40 29/11/2016 Solution as detailed in answer below 编辑:13:40 29/11/2016 解决方案如下面的答案中所述 在此处输入图片说明

You don't need to sum(K*t) , just sum(K) . 您无需求sum(K*t) ,只需求sum(K)

xi <- mtcars$mpg
plot(density(xi, kernel = "gaussian"), lwd = 2)

auto.dens <- density(xi, kernel = "gaussian")
h <- auto.dens$bw # bandwidth for kernel
x0 <- auto.dens$x # points for prediction

fx0 <- NULL
for (j in 1:length(x0)) {
  t <- abs(x0[j]-xi)/h
  K <- (1/sqrt(2*pi))*exp(-(t^2)/2)
  fx0 <- c(fx0, sum(K)/(length(t)*h))
}

lines(x0, fx0, col = "red", lty = "dotted")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM