简体   繁体   中英

Calculate probability of point on 2d density surface

If I calculate the 2d density surface of two vectors like in this example:

library(MASS)
a <- rnorm(1000)
b <- rnorm(1000, sd=2)
f1 <- kde2d(a, b, n = 100)

I get the following surface

filled.contour(f1)

在此输入图像描述

The z-value is the estimated density.

My question now is: Is it possible to calculate the probability of a single point, eg a = 1, b = -4

[as I'm not a statistician this is maybe the wrong wording. Sorry for that. I would like to know - if this is possible at all - with which probability a point occurs.]

Thanks for every comment!

If you specify an area, then that area has a probability with respect to your density function. Of course a single point does not have a probability different from zero. But it does have a non-zero density at that point. What is that then?

The density is the limit of integral of that probability density integrated over the area divided by the normal area measure as the normal area measure goes to zero. (It was actual rather hard to state that correctly, needed a few tries and it is still not optimal).

All this is really basic calculus. It is also fairly easy to write a routine to calculate the integral of that density over the area, although I imagine MASS has standard ways to do it that use more sophisticated integration techniques. Here is a quick routine that I threw together based on your example:

library(MASS)
n <- 100
a <- rnorm(1000)
b <- rnorm(1000, sd=2)
f1 <- kde2d(a, b, n = 100)
lims <- c(min(a),max(a),min(b),max(b))

filled.contour(f1)

prob <- function(f,xmin,xmax,ymin,ymax,n,lims){
  ixmin <- max( 1, n*(xmin-lims[1])/(lims[2]-lims[1]) )
  ixmax <- min( n, n*(xmax-lims[1])/(lims[2]-lims[1]) )
  iymin <- max( 1, n*(ymin-lims[3])/(lims[4]-lims[3]) ) 
  iymax <- min( n, n*(ymax-lims[3])/(lims[4]-lims[3]) )
  avg <- mean(f$z[ixmin:ixmax,iymin:iymax])
  probval <- (xmax-xmin)*(ymax-ymin)*avg
  return(probval)
}
prob(f1,0.5,1.5,-4.5,-3.5,n,lims)
# [1] 0.004788993
prob(f1,-1,1,-1,1,n,lims)
# [1] 0.2224353
prob(f1,-2,2,-2,2,n,lims)
# [1] 0.5916984
prob(f1,0,1,-1,1,n,lims)
# [1] 0.119455
prob(f1,1,2,-1,1,n,lims)
# [1] 0.05093696
prob(f1,-3,3,-3,3,n,lims)
# [1] 0.8080565
lims
# [1] -3.081773  4.767588 -5.496468  7.040882

Caveat, the routine seems right and is giving reasonable answers, but it has not undergone anywhere near the scrutiny I would give it for a production function.

The z-value here is a called a "probability density" rather than a "probability". As comments have pointed out, if you want an estimated probability you will need to integrate the estimated density to find the volume under your estimated surface.

However, if what you want is the probability density at a particular point, then you can use:

kde2d(a, b, n=1, lims=c(1, 1, -4, -4))$z[1,1]
# [1] 0.006056323

This will calculate a 1x1 "grid" with a single density estimate for the point you want.


A plot confirming that it worked:

z0 <- kde2d(a, b, n=1, lims=c(1, 1, -4, -4))$z[1,1]

filled.contour(
    f1,
    plot.axes = {
        contour(f1, levels=z0, add=TRUE)
        abline(v=1, lty=3)
        abline(h=-4, lty=3)
        axis(1); axis(2)
    }
)

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM