使用R中的“密度”函数找出新数据点的概率密度

Question

I am trying to find the best PDF of a continuous data that has unknown distribution, using the "density" function in R. Now, given a new data point, I want to find the probability density of this data point based on the kernel density estimator that I have from the "density" function result. 我正在尝试使用R中的“密度”函数找到具有未知分布的连续数据的最佳PDF。现在，给定一个新的数据点，我想根据核密度找到该数据点的概率密度我从“密度”函数结果得到的估计量。 How can I do that? 我怎样才能做到这一点？

Answer 1

If your new point will be within the range of values produced by density , it's fairly easy to do -- I'd suggest using approx (or approxfun if you need it as a function) to handle the interpolation between the grid-values. 如果你的新点将在density产生的值范围内，那么这很容易做 - 我建议使用approx （或者如果你需要它作为函数，则使用approxfun ）来处理网格值之间的插值。

Here's an example: 这是一个例子：

set.seed(2937107)
x <- rnorm(10,30,3)
dx <- density(x)
xnew <- 32.137
approx(dx$x,dx$y,xout=xnew)

If we plot the density and the new point we can see it's doing what you need: 如果我们绘制密度和新点，我们可以看到它正在做你需要的：

在此输入图像描述

This will return NA if the new value would need to be extrapolated. 如果需要推断新值，则返回NA 。 If you want to handle extrapolation, I'd suggest direct computation of the KDE for that point (using the bandwidth from the KDE you have). 如果你想处理外推，我建议直接计算该点的KDE（使用你所拥有的KDE的带宽）。

Answer 2

This is one year old, but nevertheless, here is a complete solution. 这是一年了，但是，这是一个完整的解决方案。 Let's call 我们打电话吧

d <- density(xs)

and define h = d$bw . 并定义h = d$bw 。 Your KDE estimation is completely determined by 您的KDE估计完全取决于

the elements of xs , xs的元素，
the bandwidth h , 带宽h ，
the type of kernel functions. 内核函数的类型。

Given a new value t , you can compute the corresponding y(t) , using the following function, which assumes you have used Gaussian kernels for estimation. 给定新值t ，您可以使用以下函数计算相应的y(t) ，假设您已使用高斯核进行估计。

myKDE <- function(t){
    kernelValues <- rep(0,length(xs))
    for(i in 1:length(xs)){
        transformed = (t - xs[i]) / h
        kernelValues[i] <- dnorm(transformed, mean = 0, sd = 1) / h
    }
    return(sum(kernelValues) / length(xs))
}

What myKDE does is it computes y(t) by the definition . myKDE作用是根据定义计算y(t) 。

Answer 3

请参阅： docs

dnorm(data_point, its_mean, its_stdev)

使用R中的“密度”函数找出新数据点的概率密度

问题描述

3 个解决方案

解决方案1
5 2015-01-21 22:46:18

解决方案2
4 2016-01-08 17:02:10

解决方案3
-2 2015-01-21 21:59:41

使用R中的“密度”函数找出新数据点的概率密度

问题描述

3 个解决方案

解决方案1 5 2015-01-21 22:46:18

解决方案2 4 2016-01-08 17:02:10

解决方案3 -2 2015-01-21 21:59:41

解决方案1
5 2015-01-21 22:46:18

解决方案2
4 2016-01-08 17:02:10

解决方案3
-2 2015-01-21 21:59:41