使用可变的底层网格从内核密度估计器进行模拟

Question

I have a dataset that I'm using to create an empirical probability distribution by estimating a kernel density. 我有一个数据集，可用于通过估计内核密度来创建经验概率分布。 Right now I'm using R's kde2d from the MASS package. 现在，我正在使用MASS软件包中的R的kde2d 。 After estimating the probability distribution, I use sample to sample from slices of the 2D distribution along the x-axis. 在估计了概率分布之后，我使用sample从沿x轴的2D分布的切片中进行采样。 I use sample much like described here . 我使用的sample很像这里描述的。 Example code would look like this 示例代码如下所示

library(MASS)
set.seed(123)
x = rnorm(100, 1, 0.1)
set.seed(456)
y = rnorm(100, 1, 0.5)
den <- kde2d(x, y, n = 50, lims = c(-2, 2, -2, 2))
#to plot this 2d kde:
#library(lattice)
#persp(den)
conditional_probabilty_density = list(x = den$y, y = den$z[40, ])
#to plot the slice:
#plot(conditional_probabilty_density)
simulated_sample = sample(conditional_probabilty_density$x, size = 10, replace = TRUE, prob = conditional_probabilty_density$y)

The den looks like this den看起来像这样

My data has known areas where there is a lot of fluctuations, requiring a fine grid granularity. 我的数据有很多波动的已知区域，需要精细的网格粒度。 Other areas have basically no data points and nothing is going on there. 其他区域基本上没有数据点，并且什么也没有发生。 I would be fine if I could just set the n parameter of kde2d to a very high number in order to have a good resolution of my data everywhere. 如果我可以将kde2d的n参数设置为一个很高的数字，以便在任何地方都可以很好地解析我的数据，那会很好。 Alas, this is not possible due to memory constraints. memory，由于内存限制，这是不可能的。

That's why I thought I could modify the kde2d function to have a non-constant granularity. 这就是为什么我认为我可以修改kde2d函数以使其具有非恒定粒度。
Here is the source code of the kde2d function. 这是kde2d函数的源代码。 One can modify the line 一个可以修改行

gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])

and put whatever granularity is wished for on the y-axis. 并在y轴上放置任何希望的粒度。 For example 例如

a <- seq(-1, 0, 0.5)
gy <- c(a, seq.int(0.1, 2, length.out = n[2L]-length(a)))

And the modified kde2d returns the kernel density estimate at the specified positions. 修改后的kde2d返回指定位置的内核密度估计。 Works very well. 效果很好。 Suppose I have now 假设我现在

Problem is , I can no longer use sample to sample from slices along the x-axis. 问题是 ，我可以不再使用sample从沿着x轴的切片样品。 Because the part on the left side of the distribution is much finer and thus has a higher probability to be sampled by sample . 因为分布左侧的部分要细得多，因此有更高的概率被样本sample 。

What can I do to have a fine grid where I need it, but sample from the distribution according to its proper densities? 在需要的地方有一个细网格，但要根据其适当的密度从分布中取样，该怎么办？ Thank you a lot. 非常感谢。

Answer 1

使用approx上conditional_probabilty_density一个新的n 。

使用可变的底层网格从内核密度估计器进行模拟

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-11-21 21:22:29

使用可变的底层网格从内核密度估计器进行模拟

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-11-21 21:22:29

解决方案1
0 已采纳 2017-11-21 21:22:29