简体   繁体   English

如何使用 R 创建 kernel 密度估计?

[英]How to create a kernel density estimation with R?

I would like to program a kernel estimate (with Epanechnikov kernel^1 for example).我想编写一个 kernel 估计值(例如使用 Epanechnikov kernel^1)。 I tried the following code^2 by putting the manual code (blue) and the default code (red) on the same figure (see attached) but it always gives a difference between the two density curves!我通过将手动代码(蓝色)和默认代码(红色)放在同一张图上(见附件)尝试了以下代码^2,但它总是给出两条密度曲线之间的差异!


1: The analytic form of the Epanechnikov kernel is: kappa(u) = (1-u^2), support |u| 1:Epanechnikov kernel的解析形式为:kappa(u) = (1-u^2),支持|u| <=1, with u = (x-x_{i})/h. <=1,其中 u = (x-x_{i})/h。


2: My trial code: 2:我的试用代码:

x= faithful$eruptions

fit2 <- density(x, bw = 0.6, kernel = "epanechnikov")

xgrid = seq(-1, 8, 0.1)

kernelEpan <- function(x, obs, h) sum((1-((x-obs)/h)^2)*(abs(x-obs)<=h))/h

plot(xgrid, sapply(xgrid, FUN = kernelEpan, obs = faithful$eruptions, h = 0.6)/length(faithful$eruptions), type = "l", col = "blue")

lines(fit2, col = "red")

在此处输入图像描述

If you read the docs for bw in the density function, you will see:如果您阅读density function 的bw文档,您将看到:

bw : the smoothing bandwidth to be used. bw :要使用的平滑带宽。 The kernels are scaled such that this is the standard deviation of the smoothing kernel.内核被缩放,使得这是平滑的标准偏差 kernel。

Which means that in order for your function's h parameter to match the behaviour of the bw parameter, you will need to rescale the h parameter by multiplying it by sqrt(5) .这意味着为了使函数的h参数与bw参数的行为相匹配,您需要通过将h参数乘以sqrt(5)来重新调整 h 参数。

I would be tempted to vectorize your function, which allows you to normalize it accurately too:我很想对您的 function 进行矢量化,这样您也可以准确地对其进行归一化:

kernelEpan <- function(xvals, obs, h) {
  
  h <- h * sqrt(5)
  
  dens <- sapply(xvals, function(x) {
    u <- abs(x - obs) / h
    u <- ifelse(u > 1, 1, u)
    sum(1 - u^2)
  }) 
  
  dens / sum(dens * mean(diff(xvals)))
}

This allows:这允许:

fit1 <- kernelEpan(xgrid, obs = faithful$eruptions, h = 0.6)

fit2 <- density(x, bw = 0.6, kernel = "epanechnikov")

plot(xgrid, fit1, type = "l", col = "blue")

lines(fit2, col = "red")

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM