简体   繁体   English

计算2d密度表面上的点的概率

[英]Calculate probability of point on 2d density surface

If I calculate the 2d density surface of two vectors like in this example: 如果我计算两个向量的二维密度表面,如下例所示:

library(MASS)
a <- rnorm(1000)
b <- rnorm(1000, sd=2)
f1 <- kde2d(a, b, n = 100)

I get the following surface 我得到以下表面

filled.contour(f1)

在此输入图像描述

The z-value is the estimated density. z值是估计的密度。

My question now is: Is it possible to calculate the probability of a single point, eg a = 1, b = -4 我现在的问题是:是否可以计算单个点的概率,例如a = 1,b = -4

[as I'm not a statistician this is maybe the wrong wording. [因为我不是统计学家,这可能是错误的措辞。 Sorry for that. 对不起。 I would like to know - if this is possible at all - with which probability a point occurs.] 我想知道 - 如果这是可能的话 - 出现一个点的可能性。

Thanks for every comment! 感谢您的评论!

If you specify an area, then that area has a probability with respect to your density function. 如果指定区域,则该区域具有相对于密度函数的概率。 Of course a single point does not have a probability different from zero. 当然,单个点不具有与零不同的概率。 But it does have a non-zero density at that point. 但它在那时确实具有非零密度。 What is that then? 那是什么呢?

The density is the limit of integral of that probability density integrated over the area divided by the normal area measure as the normal area measure goes to zero. 当正常面积测量值变为零时,密度是在该区域上积分的概率密度除以正常面积测量值的积分极限。 (It was actual rather hard to state that correctly, needed a few tries and it is still not optimal). (实际上很难说正确,需要几次尝试,但仍然不是最佳的)。

All this is really basic calculus. 这一切都是基本的微积分。 It is also fairly easy to write a routine to calculate the integral of that density over the area, although I imagine MASS has standard ways to do it that use more sophisticated integration techniques. 编写一个例程来计算该区域密度的积分也相当容易,尽管我认为MASS有标准的方法来使用更复杂的集成技术。 Here is a quick routine that I threw together based on your example: 这是我根据你的例子拼凑的快速例程:

library(MASS)
n <- 100
a <- rnorm(1000)
b <- rnorm(1000, sd=2)
f1 <- kde2d(a, b, n = 100)
lims <- c(min(a),max(a),min(b),max(b))

filled.contour(f1)

prob <- function(f,xmin,xmax,ymin,ymax,n,lims){
  ixmin <- max( 1, n*(xmin-lims[1])/(lims[2]-lims[1]) )
  ixmax <- min( n, n*(xmax-lims[1])/(lims[2]-lims[1]) )
  iymin <- max( 1, n*(ymin-lims[3])/(lims[4]-lims[3]) ) 
  iymax <- min( n, n*(ymax-lims[3])/(lims[4]-lims[3]) )
  avg <- mean(f$z[ixmin:ixmax,iymin:iymax])
  probval <- (xmax-xmin)*(ymax-ymin)*avg
  return(probval)
}
prob(f1,0.5,1.5,-4.5,-3.5,n,lims)
# [1] 0.004788993
prob(f1,-1,1,-1,1,n,lims)
# [1] 0.2224353
prob(f1,-2,2,-2,2,n,lims)
# [1] 0.5916984
prob(f1,0,1,-1,1,n,lims)
# [1] 0.119455
prob(f1,1,2,-1,1,n,lims)
# [1] 0.05093696
prob(f1,-3,3,-3,3,n,lims)
# [1] 0.8080565
lims
# [1] -3.081773  4.767588 -5.496468  7.040882

Caveat, the routine seems right and is giving reasonable answers, but it has not undergone anywhere near the scrutiny I would give it for a production function. 注意事项似乎是正确的,并给出了合理的答案,但它没有经历任何接近我将为生产函数提供的审查。

The z-value here is a called a "probability density" rather than a "probability". 这里的z值被称为“概率密度”而不是“概率”。 As comments have pointed out, if you want an estimated probability you will need to integrate the estimated density to find the volume under your estimated surface. 正如评论所指出的,如果您想要估计的概率,您需要整合估计的密度以找到估计表面下的体积。

However, if what you want is the probability density at a particular point, then you can use: 但是,如果您想要的是特定点的概率密度,那么您可以使用:

kde2d(a, b, n=1, lims=c(1, 1, -4, -4))$z[1,1]
# [1] 0.006056323

This will calculate a 1x1 "grid" with a single density estimate for the point you want. 这将计算1x1“网格”,其中包含您想要的单个密度估计值。


A plot confirming that it worked: 确认其有效的情节:

z0 <- kde2d(a, b, n=1, lims=c(1, 1, -4, -4))$z[1,1]

filled.contour(
    f1,
    plot.axes = {
        contour(f1, levels=z0, add=TRUE)
        abline(v=1, lty=3)
        abline(h=-4, lty=3)
        axis(1); axis(2)
    }
)

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM