简体   繁体   中英

Sample point probability from density surface

I created a 2D density surface:

library(MASS)
a <- data$x
b <- data$y
f1 <- kde2d(a, b, n = 100)
filled.contour(f1)

在此处输入图像描述

I want to determine if a sample point lies within the central 80% of the density surface. Is there a way to sample the contour map for Σ p > 0.8? I don't need the probability of a single point (like in this example ), but rather where the point lies in the probability distribution.

EDIT: Using the very helpful answer from user2554330, I created a map of my actual data points. I have a bimodal distribution. Can I still use this approach?

在此处输入图像描述

Essentially what you want to do needs two steps: first, find the contour of the estimated density such that 80% of the points fall within that contour. And then find the density at each point to see if it is higher than that contour.

We don't have your data variable, so I'll fake one:

data <- data.frame(x = rnorm(200), y = rnorm(200))
library(MASS)
a <- data$x
b <- data$y
f1 <- kde2d(a, b, n = 100)
filled.contour(f1)

For the first step, you can use the result of kde2d as follows. It returns a matrix of density values in f1$z . These will be density values, approximately proportional to the probability of a point falling in the rectangle corresponding to that matrix entry. So to find the contour value, do this:

total <- sum(f1$z)
sorted <- sort(as.numeric(f1$z), decreasing = TRUE)
cumulative <- cumsum(sorted/total)
contourlevel <- sorted[min(which(cumulative > 0.80))]

For the second step, you need to create a function which approximates the result given by kde2d . The fields::interp.surface function can do that.

densities <- fields::interp.surface(f1, data)

Check that we got the contour level right:

table(densities > contourlevel)
plot(data, col = ifelse(densities > contourlevel, "green", "red"))

Here are the results:

data <- data.frame(x = rnorm(1000), y = rnorm(1000))
library(MASS)
a <- data$x
b <- data$y
f1 <- kde2d(a, b, n = 100)
filled.contour(f1)

total <- sum(f1$z)
sorted <- sort(as.numeric(f1$z), decreasing = TRUE)
cumulative <- cumsum(sorted/total)
contourlevel <- sorted[min(which(cumulative > 0.80))]

densities <- fields::interp.surface(f1, data)

table(densities > contourlevel)
#> 
#> FALSE  TRUE 
#>   167   833
plot(data, col = ifelse(densities > contourlevel, "green", "red"))

Created on 2021-02-10 by the reprex package (v0.3.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM