简体   繁体   English

使用可变的底层网格从内核密度估计器进行模拟

[英]Simulate from kernel density estimator with variable underlying grid

I have a dataset that I'm using to create an empirical probability distribution by estimating a kernel density. 我有一个数据集,可用于通过估计内核密度来创建经验概率分布。 Right now I'm using R's kde2d from the MASS package. 现在,我正在使用MASS软件包中的R的kde2d After estimating the probability distribution, I use sample to sample from slices of the 2D distribution along the x-axis. 在估计了概率分布之后,我使用sample从沿x轴的2D分布的切片中进行采样。 I use sample much like described here . 我使用的sample很像这里描述的。 Example code would look like this 示例代码如下所示

library(MASS)
set.seed(123)
x = rnorm(100, 1, 0.1)
set.seed(456)
y = rnorm(100, 1, 0.5)
den <- kde2d(x, y, n = 50, lims = c(-2, 2, -2, 2))
#to plot this 2d kde:
#library(lattice)
#persp(den)
conditional_probabilty_density = list(x = den$y, y = den$z[40, ])
#to plot the slice:
#plot(conditional_probabilty_density)
simulated_sample = sample(conditional_probabilty_density$x, size = 10, replace = TRUE, prob = conditional_probabilty_density$y)

The den looks like this den看起来像这样

kde2d

My data has known areas where there is a lot of fluctuations, requiring a fine grid granularity. 我的数据有很多波动的已知区域,需要精细的网格粒度。 Other areas have basically no data points and nothing is going on there. 其他区域基本上没有数据点,并且什么也没有发生。 I would be fine if I could just set the n parameter of kde2d to a very high number in order to have a good resolution of my data everywhere. 如果我可以将kde2dn参数设置为一个很高的数字,以便在任何地方都可以很好地解析我的数据,那会很好。 Alas, this is not possible due to memory constraints. memory,由于内存限制,这是不可能的。

That's why I thought I could modify the kde2d function to have a non-constant granularity. 这就是为什么我认为我可以修改kde2d函数以使其具有非恒定粒度。
Here is the source code of the kde2d function. 是kde2d函数的源代码。 One can modify the line 一个可以修改行

gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])

and put whatever granularity is wished for on the y-axis. 并在y轴上放置任何希望的粒度。 For example 例如

a <- seq(-1, 0, 0.5)
gy <- c(a, seq.int(0.1, 2, length.out = n[2L]-length(a)))

And the modified kde2d returns the kernel density estimate at the specified positions. 修改后的kde2d返回指定位置的内核密度估计。 Works very well. 效果很好。 Suppose I have now 假设我现在

kde2d_2

Problem is , I can no longer use sample to sample from slices along the x-axis. 问题是 ,我可以不再使用sample从沿着x轴的切片样品。 Because the part on the left side of the distribution is much finer and thus has a higher probability to be sampled by sample . 因为分布左侧的部分要细得多,因此有更高的概率被样本sample

What can I do to have a fine grid where I need it, but sample from the distribution according to its proper densities? 在需要的地方有一个细网格,但要根据其适当的密度从分布中取样,该怎么办? Thank you a lot. 非常感谢。

使用approxconditional_probabilty_density一个新的n

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 kernel R 中的密度估计器 - kernel density estimator in R 与按刮擦计算相比,density()内核估计量的差异 - Discrepancies in the density() kernel estimator compared to calculations by scratch 无法产生R中的核密度估算器的值 - Unable to produce values of a kernel density estimator in R R - 模拟从核密度估计获得的概率密度分布的数据 - R - simulate data for probability density distribution obtained from kernel density estimate scipy.stats:高斯带宽因子 kernel 密度估计器 - scipy.stats : bandwidth factor in gaussian kernel density estimator 如何在 R 中创建 kernel 密度图的网格 - How to create grid of kernel density plots in R 从数据框中绘制2D内核密度:设置网格位置,带宽和边缘的数量 - Plot 2D-kernel density from a dataframe: set number of grid positions, bandwith and lims 如何估计R中的泊松分布样本的Lambda,并以此为基础得出估计量密度函数的核估计? - How to estimate lambdas of poisson distributed samples in R and to draw Kernel estimation of the density function of the estimator basing on that? python 是否存在具有权重的插件选择器二元核密度估计器? - Does a plug-in selector bivariate kernel density estimator with weights exist for python? 从R中的核密度估计中获取值 - Getting values from kernel density estimation in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM