简体   繁体   English

R中的指数分布

[英]Exponential distribution in R

I want to simulate some data from an exp(1) distribution but they have to be > 0.5 .so i used a while loop ,but it does not seem to work as i would like to .Thanks in advance for your responses ! 我想模拟来自exp(1)分布的一些数据,但是它们必须> 0.5。所以我使用了while循环,但是它似乎不像我想要的那样工作。在此先感谢您的回答!

x1<-c()

w<-rexp(1) 

while (length(x1) < 100) {

  if (w > 0.5) {

    x1<- w }

  else {

    w<-rexp(1)

  }

}

1) The code in the question has these problems: 1)问题中的代码存在以下问题:

  • we need a new random variable on each iteration but it only generates new random variables if the if condition is FALSE 我们在每次迭代中都需要一个新的随机变量,但如果if条件为FALSE,它只会生成新的随机变量

  • x1 is repeatedly overwritten rather than extended x1被反复覆盖而不是扩展

  • although while could be used repeat seems better since having the test at the end is a better fit than the test at the beginning 尽管while可以使用repeatrepeat似乎更好,因为在末尾进行测试比在开始时进行测试更合适

We can fix this up like this: 我们可以这样解决:

x1 <- c()
repeat {
  w <- rexp(1)
  if (w > 0.5) {
    x1 <- c(x1, w)
    if (length(x1) == 100) break
  }
}

1a) A variation would be the following. 1a)以下是一个变体。 Note that an if whose condition is FALSE evaluates to NULL if there is no else leg so if the condition is FALSE on the line marked ## then nothing is concatenated to x1 . 请注意, if没有else分支,则条件为FALSE的if评估为NULL,因此,如果在标记##的行上条件为FALSE,则没有任何内容串联到x1

x1 <- c()
repeat {
  w <- rexp(1)
  x1 <- c(x1, if (w > 0.5) w)  ##
  if (length(x1) == 100) break
}

2) Alternately, this generates 200 exponential random variables keeping only those greater than 0.5. 2)或者,这会生成200个指数随机变量,仅保留那些大于0.5的变量。 If fewer than 100 are generated then repeat. 如果生成的数量少于100,请重复。 At the end it takes the first 100 from the last batch generated. 最后,它从最后一个生成的批次中提取前100个。 We have chosen 200 to be sufficiently large that on most runs only one iteration of the loop will be needed. 我们选择了200,使其足够大,以便在大多数运行中仅需要循环的一次迭代。

repeat {
  r <- rexp(200)
  r <- r[r > 0.5]
  if (length(r) >= 100) break
}
r <- head(r, 100)

Alternative (2) is actually faster than (1) or (1a) because it is more highly vectorized. 备选方案(2)实际上比(1)或(1a)更快,因为它的矢量化程度更高。 This is despite it throwing away more exponential random variables than the other solutions. 尽管它比其他解决方案丢掉了更多的指数随机变量。

I would advise against a while (or any other accept/reject) loop; 我建议不要使用while (或任何其他接受/拒绝)循环。 instead use the methods from truncdist : 而是使用truncdist的方法:

# Sample 1000 observations from a truncated exponential
library(truncdist);
x <- rtrunc(1000, spec = "exp", a = 0.5);

# Plot
library(ggplot2);
ggplot(data.frame(x = x), aes(x)) + geom_histogram(bins = 50) + xlim(0, 10);

在此处输入图片说明

It's also fairly straightforward to implement a sampler using inverse transform sampling to draw samples from a truncated exponential distribution that avoids rejecting samples in a loop. 使用逆变换采样实现采样器也很简单,它可以从截断的指数分布中抽取采样,从而避免在循环中拒绝采样。 This will be a more efficient method than any accept/reject-based sampling method, and works particularly well in your case, since there exists a closed form of the truncated exponential cdf. 这将是一种比任何基于接受/拒绝的采样方法更为有效的方法,并且在您的情况下效果特别好,因为存在截断指数cdf的封闭形式。 See for example this post for more details. 例如,请参阅此帖子以获取更多详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM