[英]Simulate from kernel density estimator with variable underlying grid
I have a dataset that I'm using to create an empirical probability distribution by estimating a kernel density. 我有一个数据集,可用于通过估计内核密度来创建经验概率分布。 Right now I'm using R's kde2d
from the MASS package. 现在,我正在使用MASS软件包中的R的kde2d
。 After estimating the probability distribution, I use sample
to sample from slices of the 2D distribution along the x-axis. 在估计了概率分布之后,我使用sample
从沿x轴的2D分布的切片中进行采样。 I use sample
much like described here . 我使用的sample
很像这里描述的。 Example code would look like this 示例代码如下所示
library(MASS)
set.seed(123)
x = rnorm(100, 1, 0.1)
set.seed(456)
y = rnorm(100, 1, 0.5)
den <- kde2d(x, y, n = 50, lims = c(-2, 2, -2, 2))
#to plot this 2d kde:
#library(lattice)
#persp(den)
conditional_probabilty_density = list(x = den$y, y = den$z[40, ])
#to plot the slice:
#plot(conditional_probabilty_density)
simulated_sample = sample(conditional_probabilty_density$x, size = 10, replace = TRUE, prob = conditional_probabilty_density$y)
The den
looks like this den
看起来像这样
My data has known areas where there is a lot of fluctuations, requiring a fine grid granularity. 我的数据有很多波动的已知区域,需要精细的网格粒度。 Other areas have basically no data points and nothing is going on there. 其他区域基本上没有数据点,并且什么也没有发生。 I would be fine if I could just set the n
parameter of kde2d
to a very high number in order to have a good resolution of my data everywhere. 如果我可以将kde2d
的n
参数设置为一个很高的数字,以便在任何地方都可以很好地解析我的数据,那会很好。 Alas, this is not possible due to memory constraints. memory,由于内存限制,这是不可能的。
That's why I thought I could modify the kde2d
function to have a non-constant granularity. 这就是为什么我认为我可以修改kde2d
函数以使其具有非恒定粒度。
Here is the source code of the kde2d function. 这是kde2d函数的源代码。 One can modify the line 一个可以修改行
gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])
and put whatever granularity is wished for on the y-axis. 并在y轴上放置任何希望的粒度。 For example 例如
a <- seq(-1, 0, 0.5)
gy <- c(a, seq.int(0.1, 2, length.out = n[2L]-length(a)))
And the modified kde2d
returns the kernel density estimate at the specified positions. 修改后的kde2d
返回指定位置的内核密度估计。 Works very well. 效果很好。 Suppose I have now 假设我现在
Problem is , I can no longer use sample
to sample from slices along the x-axis. 问题是 ,我可以不再使用sample
从沿着x轴的切片样品。 Because the part on the left side of the distribution is much finer and thus has a higher probability to be sampled by sample
. 因为分布左侧的部分要细得多,因此有更高的概率被样本sample
。
What can I do to have a fine grid where I need it, but sample from the distribution according to its proper densities? 在需要的地方有一个细网格,但要根据其适当的密度从分布中取样,该怎么办? Thank you a lot. 非常感谢。
使用approx
上conditional_probabilty_density
一个新的n
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.