简体   繁体   English

来自R中自定义分发的样本

[英]Sample from custom distribution in R

I have implemented an alternate parameterization of the negative binomial distribution in R, like so (also see here ): 我已经在R中实现了负二项式分布的替代参数化,如下所示(另请参见此处 ):

nb = function(n, l, a){
  first = choose((n + a - 1), a-1)
  second = (l/(l+a))^n
  third = (a/(l+a))^a
  return(first*second*third)
}

Where n is the count, lambda is the mean, and a is the overdispersion term. 其中n是计数,lambda是平均值,a是超分散项。

I would like to draw random samples from this distribution in order to validate my implementation of a negative binomial mixture model, but am not sure how to go about doing this. 我想从此分布抽取随机样本,以验证我对负二项式混合模型的实现,但不确定如何执行此操作。 The CDF of this function isn't easily defined, so I considered trying rejection sampling as discussed here , but that didn't work either (and I'm not sure why- the article says to first draw from a uniform distribution between 0 and 1, but I want my NB distribution to model integer counts...I'm not sure if I understand this approach fully.) 这个函数的CDF并不容易定义,因此我考虑了此处讨论的拒绝采样的方法,但这也不起作用(而且我也不知道为什么-这篇文章说首先从0到0之间的均匀分布中提取。 1,但我想让我的NB分布对整数计数建模...我不确定我是否完全理解这种方法。)

Thank you for your help. 谢谢您的帮助。

I recommend you look up the Uniform distribution as well as the Universality of the Uniform. 我建议您查找均匀分布以及均匀性。 You can do exactly what you want by passing a uniformly distributed variable to the inverse CDF of the NB Binomial and what you will get is set of points sampled from your NB Binomial distribution. 您可以通过将均匀分布的变量传递给NB二项式的逆CDF来精确地执行您想要的操作,并且您将获得的是从NB二项式分布中采样的点集。

EDIT: I see that the negative binomial has a CDF which has no closed form inverse. 编辑:我看到负二项式有一个CDF,它没有闭合形式的逆。 My second recommendation would be to scrap your function and use a built-in: 我的第二个建议是取消您的功能并使用内置功能:

library(MASS)
rnegbin(n, mu = n, theta = stop("'theta' must be specified"))

It seems like you could: 看来您可以:

1) Draw a uniform random number between zero and one. 1)绘制一个介于零和一之间的统一随机数。

2) Numerically integrate the probability density function (this is really just a sum, since the distribution is discrete and lower-bounded at zero). 2)对概率密度函数进行数值积分(这实际上只是一个和,因为分布是离散的并且在0下限)。

3) Whichever value in your integration takes the cdf past your random number, that's your random draw. 3)无论积分中的哪个值使CDF超过您的随机数,这就是您的随机抽奖。

So all together, do something like the following: 因此,一起执行以下操作:

r <- runif(1,0,1)
cdf <- 0
i <- -1
while(cdf < r){
  i <- i+1
  p <- PMF(i)
  cdf <- cdf + p
}

Where PMF(i) is the probability mass over a count of i, as specified by the parameters of the distribution. 其中PMF(i)是i的计数上的概率质量,由分布参数指定。 The value of i when this while-loop finishes is your sample. 当while循环结束时的i值就是您的样本。

If you really just want to test and so speed is not the issue, the inversion method, as mentioned by others, is probably the way to go. 如果您真的只是想测试而速度不是问题,那么正如其他人所提到的,反转方法可能是可行的方法。

For a discrete random variable, it requires a simple while loop. 对于离散随机变量,它需要一个简单的while循环。 See Non-Uniform Random Variate Generation by L. Devroye, chapter 3, p. 参见L. Devroye的非均匀随机变量生成 ,第3章,第2页。 85. 85。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM