簡體   English   中英

人口蒙特卡洛實施

[英]Population Monte Carlo implementation

我想在描述來實現人口蒙特卡洛算法, 紙(參見第78頁圖3)一個簡單的模型(參見功能model()與使用Python的一個參數。 不幸的是,算法不起作用,我無法弄清楚出了什么問題。 請參閱下面的實施。 實際的函數叫做abc() 所有其他函數可以看作輔助函數,似乎工作正常。

為了檢查算法是否工作,我首先生成觀察數據,其中模型的唯一參數設置為param = 8.因此,ABC算法產生的后驗應該以8為中心。事實並非如此,我想知道為什么。

我將不勝感激任何幫助或評論。

# imports

from math import exp
from math import log
from math import sqrt
import numpy as np
import random
from scipy.stats import norm


# globals
N = 300              # sample size
N_PARTICLE = 300      # number of particles
ITERS = 5            # number of decreasing thresholds
M = 10               # number of words to remember
MEAN = 7             # prior mean of parameter
SD = 2               # prior sd of parameter


def model(param):
  recall_prob_all = 1/(1 + np.exp(M - param))
  recall_prob_one_item = np.exp(np.log(recall_prob_all) / float(M))
  return sum([1 if random.random() < recall_prob_one_item else 0 for item in range(M)])

## example
print "Output of model function: \n" + str(model(10)) + "\n"

# generate data from model
def generate(param):
  out = np.empty(N)
  for i in range(N):
    out[i] = model(param)
  return out

## example

print "Output of generate function: \n" + str(generate(10)) + "\n"


# distance function (sum of squared error)
def distance(obsData,simData):
  out = 0.0
  for i in range(len(obsData)):
    out += (obsData[i] - simData[i]) * (obsData[i] - simData[i])
  return out

## example

print "Output of distance function: \n" + str(distance([1,2,3],[4,5,6])) + "\n"


# sample new particles based on weights
def sample(particles, weights):
  return np.random.choice(particles, 1, p=weights)

## example

print "Output of sample function: \n" + str(sample([1,2,3],[0.1,0.1,0.8])) + "\n"


# perturbance function
def perturb(variance):
  return np.random.normal(0,sqrt(variance),1)[0]

## example 

print "Output of perturb function: \n" + str(perturb(1)) + "\n"

# compute new weight
def computeWeight(prevWeights,prevParticles,prevVariance,currentParticle):
  denom = 0.0
  proposal = norm(currentParticle, sqrt(prevVariance))
  prior = norm(MEAN,SD)
  for i in range(len(prevParticles)):
    denom += prevWeights[i] * proposal.pdf(prevParticles[i])
  return prior.pdf(currentParticle)/denom


## example 

prevWeights = [0.2,0.3,0.5]
prevParticles = [1,2,3]
prevVariance = 1
currentParticle = 2.5
print "Output of computeWeight function: \n" + str(computeWeight(prevWeights,prevParticles,prevVariance,currentParticle)) + "\n"


# normalize weights
def normalize(weights):
  return weights/np.sum(weights)


## example 

print "Output of normalize function: \n" + str(normalize([3.,5.,9.])) + "\n"


# sampling from prior distribution
def rprior():
  return np.random.normal(MEAN,SD,1)[0]

## example 

print "Output of rprior function: \n" + str(rprior()) + "\n"


# ABC using Population Monte Carlo sampling
def abc(obsData,eps):
  draw = 0
  Distance = 1e9
  variance = np.empty(ITERS)
  simData = np.empty(N)
  particles = np.empty([ITERS,N_PARTICLE])
  weights = np.empty([ITERS,N_PARTICLE])

  for t in range(ITERS):
    if t == 0:
      for i in range(N_PARTICLE):
        while(Distance > eps[t]):
          draw = rprior()
          simData = generate(draw)
          Distance = distance(obsData,simData)

        Distance = 1e9
        particles[t][i] = draw
        weights[t][i] = 1./N_PARTICLE

      variance[t] = 2 * np.var(particles[t])
      continue


    for i in range(N_PARTICLE):
      while(Distance > eps[t]):
        draw = sample(particles[t-1],weights[t-1])
        draw += perturb(variance[t-1])
        simData = generate(draw)
        Distance = distance(obsData,simData)

      Distance = 1e9
      particles[t][i] = draw
      weights[t][i] = computeWeight(weights[t-1],particles[t-1],variance[t-1],particles[t][i])


    weights[t] = normalize(weights[t])  
    variance[t] = 2 * np.var(particles[t])

  return particles[ITERS-1]



true_param = 9
obsData = generate(true_param)
eps = [15000,10000,8000,6000,3000]
posterior = abc(obsData,eps)
#print posterior

當我正在尋找PMC算法的pythonic實現時,我偶然發現了這個問題,因為,巧合的是,我目前正在將這篇技術中的技術應用到我自己的研究中。

你能發布你得到的結果嗎? 我的猜測是1)你使用了較差的距離函數選擇(和/或相似度閾值),或2)你沒有使用足夠的粒子。 我可能在這里錯了(我對樣本統計數據不是很精通),但是你的距離函數隱含地向我暗示隨機抽取的順序很重要。 我必須更多地考慮這個以確定它是否確實對收斂屬性有任何影響(可能不會),但為什么不簡單地使用均值或中位數作為樣本統計量?

我使用1000個粒子和8的真實參數運行代碼,同時使用樣本均值之間的絕對差值作為我的距離函數,使用epsilons [0.5,0.3,0.1]進行三次迭代; 我估計的后驗分布的峰值似乎接近8,就像它在每次迭代時應該的那樣,以及人口方差的減少。 請注意,仍然存在明顯的向右偏差,但這是因為模型的不對稱性(8或更小的參數值永遠不會導致超過8次觀察到的成功,而所有參數值大於8都可以導致向右分布中的偏斜)。

這是我的結果圖:

三次迭代中的粒子演化,收斂到8的真值,並證明了更高參數值趨勢的輕微不對稱性。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM