简体   繁体   English

使用R中的“密度”函数找出新数据点的概率密度

[英]Find the probability density of a new data point using “density” function in R

I am trying to find the best PDF of a continuous data that has unknown distribution, using the "density" function in R. Now, given a new data point, I want to find the probability density of this data point based on the kernel density estimator that I have from the "density" function result. 我正在尝试使用R中的“密度”函数找到具有未知分布的连续数据的最佳PDF。现在,给定一个新的数据点,我想根据核密度找到该数据点的概率密度我从“密度”函数结果得到的估计量。 How can I do that? 我怎样才能做到这一点?

If your new point will be within the range of values produced by density , it's fairly easy to do -- I'd suggest using approx (or approxfun if you need it as a function) to handle the interpolation between the grid-values. 如果你的新点将在density产生的值范围内,那么这很容易做 - 我建议使用approx (或者如果你需要它作为函数,则使用approxfun )来处理网格值之间的插值。

Here's an example: 这是一个例子:

set.seed(2937107)
x <- rnorm(10,30,3)
dx <- density(x)
xnew <- 32.137
approx(dx$x,dx$y,xout=xnew)

If we plot the density and the new point we can see it's doing what you need: 如果我们绘制密度和新点,我们可以看到它正在做你需要的:

在此输入图像描述

This will return NA if the new value would need to be extrapolated. 如果需要推断新值,则返回NA If you want to handle extrapolation, I'd suggest direct computation of the KDE for that point (using the bandwidth from the KDE you have). 如果你想处理外推,我建议直接计算该点的KDE(使用你所拥有的KDE的带宽)。

This is one year old, but nevertheless, here is a complete solution. 这是一年了,但是,这是一个完整的解决方案。 Let's call 我们打电话吧

d <- density(xs)

and define h = d$bw . 并定义h = d$bw Your KDE estimation is completely determined by 您的KDE估计完全取决于

  • the elements of xs , xs的元素,
  • the bandwidth h , 带宽h
  • the type of kernel functions. 内核函数的类型。

Given a new value t , you can compute the corresponding y(t) , using the following function, which assumes you have used Gaussian kernels for estimation. 给定新值t ,您可以使用以下函数计算相应的y(t) ,假设您已使用高斯核进行估计。

myKDE <- function(t){
    kernelValues <- rep(0,length(xs))
    for(i in 1:length(xs)){
        transformed = (t - xs[i]) / h
        kernelValues[i] <- dnorm(transformed, mean = 0, sd = 1) / h
    }
    return(sum(kernelValues) / length(xs))
}

What myKDE does is it computes y(t) by the definition . myKDE作用是根据定义计算y(t)

请参阅: docs

dnorm(data_point, its_mean, its_stdev)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM