简体   繁体   English

R:从概率密度分布生成数据

[英]R: Generate data from a probability density distribution

Say I have a simple array, with a corresponding probability distribution. 假设我有一个简单的数组,具有相应的概率分布。

library(stats)    
data <- c(0,0.08,0.15,0.28,0.90)
pdf_of_data <- density(data, from= 0, to=1, bw=0.1)

Is there a way I could generate another set of data using the same distribution. 有没有办法可以使用相同的分布生成另一组数据。 As the operation is probabilistic, it need not exactly match the initial distribution anymore, but will be just generated from it. 由于操作是概率性的,它不再需要与初始分布完全匹配,而只是从它生成。

I did have success finding a simple solution on my own. 我确实成功地找到了一个简单的解决方案。 Thanks! 谢谢!

From the examples in the documentation of ?density you (almost) get the answer. ?density文档中的示例中,您(几乎)得到了答案。

So, something like this should do it: 所以,这样的事情应该这样做:

library("stats")    
data <- c(0,0.08,0.15,0.28,0.90)
pdf_of_data <- density(data, from= 0, to=1, bw=0.1)

# From the example.
N <- 1e6
x.new <- rnorm(N, sample(data, size = N, replace = TRUE), pdf_of_data$bw)

# Histogram of the draws with the distribution superimposed.
hist(x.new, freq = FALSE)
lines(pdf_of_data)

Imgur

You can just reject the draws outside your interval as in rejection sampling. 您可以在拒绝抽样拒绝区间之外的抽奖 Alternatively, you can use the algorithm described in the link. 或者,您可以使用链接中描述的算法。

Your best bet is to generate the empirical cumulative density function, approximate the inverse, and then transform the input. 最好的办法是生成经验累积密度函数,近似反函数,然后转换输入。

The compound expression looks like 复合表达看起来像

random.points <- approx(
  cumsum(pdf_of_data$y)/sum(pdf_of_data$y),
  pdf_of_data$x,
  runif(10000)
)$y

Yields 产量

hist(random.points, 100)

在此输入图像描述

从曲线中画出:

sample(pdf_of_data$x, 1e6, TRUE, pdf_of_data$y)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM