[英]Sample point probability from density surface
I created a 2D density surface:我创建了一个二维密度表面:
library(MASS)
a <- data$x
b <- data$y
f1 <- kde2d(a, b, n = 100)
filled.contour(f1)
I want to determine if a sample point lies within the central 80% of the density surface.我想确定一个样本点是否位于密度表面的中心 80% 内。 Is there a way to sample the contour map for Σ p > 0.8?有没有办法对 Σ p > 0.8 的轮廓 map 进行采样? I don't need the probability of a single point (like in this example ), but rather where the point lies in the probability distribution.我不需要单个点的概率(如本例中),而是该点在概率分布中的位置。
EDIT: Using the very helpful answer from user2554330, I created a map of my actual data points.编辑:使用来自 user2554330 的非常有用的答案,我创建了我的实际数据点的 map。 I have a bimodal distribution.我有一个双峰分布。 Can I still use this approach?我还能使用这种方法吗?
Essentially what you want to do needs two steps: first, find the contour of the estimated density such that 80% of the points fall within that contour.基本上你想要做的需要两个步骤:首先,找到估计密度的轮廓,使得 80% 的点落在该轮廓内。 And then find the density at each point to see if it is higher than that contour.然后找到每个点的密度,看它是否高于那个轮廓。
We don't have your data
variable, so I'll fake one:我们没有你的data
变量,所以我会伪造一个:
data <- data.frame(x = rnorm(200), y = rnorm(200))
library(MASS)
a <- data$x
b <- data$y
f1 <- kde2d(a, b, n = 100)
filled.contour(f1)
For the first step, you can use the result of kde2d
as follows.第一步,您可以使用kde2d
的结果,如下所示。 It returns a matrix of density values in f1$z
.它返回f1$z
中的密度值矩阵。 These will be density values, approximately proportional to the probability of a point falling in the rectangle corresponding to that matrix entry.这些将是密度值,大约与点落在对应于该矩阵条目的矩形中的概率成比例。 So to find the contour value, do this:因此,要找到轮廓值,请执行以下操作:
total <- sum(f1$z)
sorted <- sort(as.numeric(f1$z), decreasing = TRUE)
cumulative <- cumsum(sorted/total)
contourlevel <- sorted[min(which(cumulative > 0.80))]
For the second step, you need to create a function which approximates the result given by kde2d
.对于第二步,您需要创建一个近似于 kde2d 给出的结果的kde2d
。 The fields::interp.surface
function can do that. fields::interp.surface
function 可以做到这一点。
densities <- fields::interp.surface(f1, data)
Check that we got the contour level right:检查我们是否得到了正确的等高线水平:
table(densities > contourlevel)
plot(data, col = ifelse(densities > contourlevel, "green", "red"))
Here are the results:结果如下:
data <- data.frame(x = rnorm(1000), y = rnorm(1000))
library(MASS)
a <- data$x
b <- data$y
f1 <- kde2d(a, b, n = 100)
filled.contour(f1)
total <- sum(f1$z)
sorted <- sort(as.numeric(f1$z), decreasing = TRUE)
cumulative <- cumsum(sorted/total)
contourlevel <- sorted[min(which(cumulative > 0.80))]
densities <- fields::interp.surface(f1, data)
table(densities > contourlevel)
#>
#> FALSE TRUE
#> 167 833
plot(data, col = ifelse(densities > contourlevel, "green", "red"))
Created on 2021-02-10 by the reprex package (v0.3.0)由代表 package (v0.3.0) 于 2021 年 2 月 10 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.