R：从概率密度分布生成数据

Question

Say I have a simple array, with a corresponding probability distribution. 假设我有一个简单的数组，具有相应的概率分布。

library(stats)    
data <- c(0,0.08,0.15,0.28,0.90)
pdf_of_data <- density(data, from= 0, to=1, bw=0.1)

Is there a way I could generate another set of data using the same distribution. 有没有办法可以使用相同的分布生成另一组数据。 As the operation is probabilistic, it need not exactly match the initial distribution anymore, but will be just generated from it. 由于操作是概率性的，它不再需要与初始分布完全匹配，而只是从它生成。

I did have success finding a simple solution on my own. 我确实成功地找到了一个简单的解决方案。 Thanks! 谢谢！

Answer 1

From the examples in the documentation of ?density you (almost) get the answer. 从?density文档中的示例中，您（几乎）得到了答案。

So, something like this should do it: 所以，这样的事情应该这样做：

library("stats")    
data <- c(0,0.08,0.15,0.28,0.90)
pdf_of_data <- density(data, from= 0, to=1, bw=0.1)

# From the example.
N <- 1e6
x.new <- rnorm(N, sample(data, size = N, replace = TRUE), pdf_of_data$bw)

# Histogram of the draws with the distribution superimposed.
hist(x.new, freq = FALSE)
lines(pdf_of_data)

Imgur

You can just reject the draws outside your interval as in rejection sampling. 您可以在拒绝抽样中拒绝区间之外的抽奖。 Alternatively, you can use the algorithm described in the link. 或者，您可以使用链接中描述的算法。

Answer 2

Your best bet is to generate the empirical cumulative density function, approximate the inverse, and then transform the input. 最好的办法是生成经验累积密度函数，近似反函数，然后转换输入。

The compound expression looks like 复合表达看起来像

random.points <- approx(
  cumsum(pdf_of_data$y)/sum(pdf_of_data$y),
  pdf_of_data$x,
  runif(10000)
)$y

Yields 产量

hist(random.points, 100)

Answer 3

从曲线中画出：

sample(pdf_of_data$x, 1e6, TRUE, pdf_of_data$y)

R：从概率密度分布生成数据

问题描述

3 个解决方案

解决方案1
8 2015-09-30 16:54:00

解决方案2
7 已采纳 2015-09-30 17:40:31

解决方案3
3 2015-09-30 16:56:30

R：从概率密度分布生成数据

问题描述

3 个解决方案

解决方案1 8 2015-09-30 16:54:00

解决方案2 7 已采纳 2015-09-30 17:40:31

解决方案3 3 2015-09-30 16:56:30

解决方案1
8 2015-09-30 16:54:00

解决方案2
7 已采纳 2015-09-30 17:40:31

解决方案3
3 2015-09-30 16:56:30