简体   繁体   中英

Rejection Sampling to generate Normal samples from Cauchy samples

I tried my luck on coding a rejection sampling method to generate a sample that follows a normal distribution. The samples look like normal distributions on first glance but the p-value of the Shapiro-Wilk test is always <0.05. I don't really know where I turned wrong and I only got the pseudo-code from my teacher (its NOT homework). Any help is appreciated. Below my code:

f <- function(x,m,v) {    #target distribution, m=mean,v=variance
  dnorm(x,m,sqrt(v))
}

g <- function(x,x0,lambda) {  #cauchy distribution for sampling
  dcauchy(x,x0,lambda)
}

genSamp <- function(n,m,v) {  #I want the user to be able to choose mean and sd
                              #and size of the sample
  stProbe <- rep(0,n)         #the sample vector
  interval = c(m-10*sqrt(v),m+10*sqrt(v)) #wanted to go sure that everything
                                          #is covered, so I took a range
                                          #that depends on the mean
  M = max(f(interval,m,v)/g(interval,m,v))  #rescaling coefficient, so the cauchy distribution
                              #is never under the normal distribution
  #I chose x0 = m and lambda = v, so the cauchy distribution is close to a
  #the target normal distribution

  for (i in 1:n) {
    repeat{
      x <- rcauchy(1,m,v)
      u <- runif(1,0,max(f(interval,m,v)))
      if(u < (f(x,m,v)/(M*g(x,m,v)))) {
        break
      }
    }
    stProbe[i] <- x
  }

  return(stProbe)
}

Then I tried it out with:

test <- genSamp(100,2,0.5)
hist(test,prob=T,breaks=30)#looked not bad
shapiro.test(test) #p-value way below 0.05

Thank you in advance for your help.

Actually, the first thing I checked is sample mean and sample variance. When I draw 1000 samples with your genSamp , I get sample mean at 2, but sample variance at about 2.64, far from the target 0.5.

The 1st problem is with your computation of M . Note that:

interval = c(m - 10 * sqrt(v), m + 10 * sqrt(v))

only gives you 2 values, rather than a grid of equally spaced points on the interval. At 10 standard deviation away from the mean, the Normal density is almost 0, so M is almost 0. You need to do something like

interval <- seq(m - 10 * sqrt(v), m + 10 * sqrt(v), by = 0.01)

The 2nd problem is the generation of uniform random variable in your repeat . Why do you do

u <- runif(1,0,max(f(interval,m,v)))

You want

u <- runif(1, 0, 1)

With these fixes, I have tested that genSamp gets the correct sample mean and sample variance. The samples pass both Shapiro–Wilk test and Kolmogorov-Smirnov test ( ?ks.test ).


Full working code

f <- function(x,m,v) dnorm(x,m,sqrt(v))

g <- function(x,x0,lambda) dcauchy(x,x0,lambda)

genSamp <- function(n,m,v) {

  stProbe <- rep(0,n)
  interval <- seq(m - 10 * sqrt(v), m + 10 * sqrt(v), by = 0.01)
  M = max(f(interval,m,v)/g(interval,m,v))

  for (i in 1:n) {
    repeat{
      x <- rcauchy(1,m,v)
      u <- runif(1,0,1)
      if(u < (f(x,m,v)/(M*g(x,m,v)))) break
      }
    stProbe[i] <- x
    }

  return(stProbe)
  }

set.seed(0)
test <- genSamp(1000, 2, 0.5)
shapiro.test(test)$p.value
#[1] 0.1563038

ks.test(test, rnorm(1000, 2, sqrt(0.5)))$p.value
#[1] 0.7590978

You have

f <- function(x,m,v) {    #target distribution, m=mean,v=variance
  dnorm(x,e,sqrt(v))
}

which samples with mean e , but that is never defined.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM