简体   繁体   中英

Gaussian kernel density estimation in R

I am having trouble understanding how to implement a Gaussian kernel density estimation of the following dataset in R. I appreciate if you can help me understand the mechanism of how to do it. I am currently trying to get a formula for the bell shaped curves at the bottom of the following picture. As you can see there is one bell shaped curve for each data point. (Note the picture does not represent the data I am using.)

在此处输入图像描述

This is my data:

x<-c(4.09, 4.46, 4.61, 4.30, 4.03, 5.22, 4.21, 4.07, 4.02, 4.58, 4.66, 4.05, 4.23, 5.51, 4.03, 4.72, 4.47, 4.50, 5.80, 4.30, 4.09, 4.78, 4.18, 4.45, 4.40, 5.60, 4.37, 4.42, 4.88, 4.20, 4.45, 4.10, 4.43, 4.58, 4.40, 4.38) (x has 36 elements)

This is the kernel density estimator:

在此处输入图像描述

(If you can't see the image, it's from this page http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/xlghtmlnode33.html )

where K(u)= 在此处输入图像描述

is the Gaussian kernel function and h=.1516 is the bandwidth selected by Scott.

So, plugging in we get f hat (x) = 1/(36*.1516) (1/sqrt(2pi))[e^(-1/2 ((4.09-x)/.1516)^2 + e^(-1/2 ((4.46-x)/.1516)^2 +... + e^(-1/2 ((4.38-x)/.1516)^2]

Ok. So we have a function of x. But how do we get the equation of each of the bell shaped curves in the above diagram? If we plug in, for example, 4.09, into f hat (x) we get a number, not a curve/function/distribution. Can someone help me understand the procedure to find the equation for the bell shaped curve/kernel density estimate?

Here's a function that will return your fhat function given your x values and h value

get_fhat <- function(x, h) {
  Vectorize(function(z) 1/length(x)/h*sum(dnorm((x-z)/h)))  
}

This function returns a function that we can use to get values. We Vectorize it so we can pass in multiple values at once to the function.

We can get a single value or plot it with

fhat <- get_fhat(x, .1516)
fhat(4.09)
# [1] 0.9121099
curve(fhat, from=min(x), to=max(x))

在此处输入图像描述

Graph

## Given data
x  <- c(4.09, 4.46, 4.61, 4.30, 4.03, 5.22, 4.21, 4.07, 4.02, 4.58, 4.66, 4.05, 
        4.23, 5.51, 4.03, 4.72, 4.47, 4.50, 5.80, 4.30, 4.09, 4.78, 4.18, 4.45, 
        4.40, 5.60, 4.37, 4.42, 4.88, 4.20, 4.45, 4.10, 4.43, 4.58, 4.40, 4.38)
h  <- 0.1516 

# GaussianKernel
GK <- function(u) {(1/sqrt(2*pi))*exp(-(u^2)/2)} # or dnorm(u)

This function gives a similar plot.

DensityGraph <- function(x, h){
  n    <- length(x)
  xi   <- seq(min(x) - sd(x), max(x) + sd(x), length.out = 512)
  # fhat without sum since we are interest in the bell shaped curves
  fhat <- sapply(x, function(y){(1/(n*h))*GK((xi - y)/h)})
  # histogram of x
  hist (x, freq = FALSE, nclass = 15, main = "Kernel density with histogram",
        xlab = paste("N = ", n, "   ", "Bandwidth = ", h))
  # add fhat with sum
  lines(xi, rowSums(fhat), lwd = 2)
  # add the bell shaped curves
  apply(fhat, 2, function(j) lines(xi, j, col = 4))
  # show data points
  rug  (x, lwd = 2, col = 2)
}


DensityGraph(x = x, h = 0.05)

具有 x 直方图的核密度图

Blue bell shaped curves represent each data point of x

DensityGraph(x = x, h = 0.1516)

Compare with built in density function in R

lines(density(x = x, bw = 0.1516), col = 3, lwd = 2)

fhat for each x

This function gives the value of fhat given a specific x

fhat <- function(x, h, specific_x){
  n    <- length(x)
  xi   <- seq(min(x) - sd(x), max(x) + sd(x), length.out = 512)
  f    <- rowSums(sapply(x, function(y){(1/(n*h))*GK((xi - y)/h)}))
  kde  <- data.frame(xi, fhat = f)
  indx <- which.min(abs(xi - specific_x))
  fx   <- kde[indx, "fhat"]
  list(fx = fx, kde = kde)
}

KernelDensity <- fhat(x = x, h = 0.1516, specific_x = 4.09)
KernelDensity$fx
# [1] 0.9114677
plot(KernelDensity$kde, type  = "l", lwd = 2, xlab = "")
title(xlab = paste("N = ", n, "    Bandwidth = ", h))
rug(x, lwd = 2, col = 2)

Compare built in density function

lines(density(x, bw = 0.1516), col = 5) 

阴谋

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM