简体   繁体   中英

AssertionError in PyMC3 code using Deterministic distributions

I have been trying for the past days to get accustomed to PyMC to eventually do some MCMC sampling of distributions from some models I have direct codes for (I'm ultimately interested in parameters estimation).

To my knowledge there aren't so many examples out there of people showing their codes (like an external C or FORTRAN code) that they succesfully make work with PyMC3. I found here or here bribes of information so far.

So to start with simple problems, I tried to reproduce with PyMC3 some results from existing Python codes doing MCMC with 'complicated' (read: more than the doc examples) distributions not using PyMC, namely this simple result which runs in 2 seconds for 10000 samples on my laptop.

To define Deterministic variables in PyMC3, one should use the @theano.compile.ops.as_op decorator (it was @deterministic in PyMC, which is now deprecated in PyMC3), and this is what I did.

Here is my code (I use Spyder and so IPython), inspired by the PyMC documentation , in which I encounter an AssertionError after the second cell (in the Estimation process , so before sampling). I have been trying to solve this for the last two days but I really do not understand what the error can be. I believe it should be a some kind of PyMC or Theano trick that I did not catch yet, as I believe I'm really close.

#%% Define model
import numpy,math
import matplotlib.pyplot as plt
import random as random

import theano.tensor as t
import theano

random.seed(1)  # set random seed

# copy-pasted function from the specific model used in the source and adapted with as_op 
@theano.compile.ops.as_op(itypes=[t.iscalar, t.dscalar, t.fscalar, t.fscalar],otypes=[t.dvector])
def sampleFromSalpeter(N, alpha, M_min, M_max):        
    log_M_Min = math.log(M_min)
    log_M_Max = math.log(M_max)
    maxlik = math.pow(M_min, 1.0 - alpha)
    Masses = []
    while (len(Masses) < N):            
        logM = random.uniform(log_M_Min,log_M_Max)
        M    = math.exp(logM)            
        likelihood = math.pow(M, 1.0 - alpha)            
        u = random.uniform(0.0,maxlik)
        if (u < likelihood):
            Masses.append(M)
    return Masses

# SAME function as above, used to make test data (so no Theano here)
def sampleFromSalpeterDATA(N, alpha, M_min, M_max):        
    log_M_Min = math.log(M_min)
    log_M_Max = math.log(M_max)        
    maxlik = math.pow(M_min, 1.0 - alpha)        
    Masses = []        
    while (len(Masses) < N):            
        logM = random.uniform(log_M_Min,log_M_Max)
        M    = math.exp(logM)            
        likelihood = math.pow(M, 1.0 - alpha)            
        u = random.uniform(0.0,maxlik)
        if (u < likelihood):
            Masses.append(M)
    return Masses

# Generate toy data.
N      = 1000000  # Draw 1 Million stellar masses.
alpha  = 2.35
M_min  = 1.0
M_max  = 100.0
Masses = sampleFromSalpeterDATA(N, alpha, M_min, M_max)

#%% Estimation process
import pymc3 as pm
basic_model = pm.Model()
with basic_model:

    # Priors for unknown model parameters
    alpha2 = pm.Normal('alpha2', mu=3, sd=10)

    N2=t.constant(1000000)
    M_min2  = t.constant(1.0)
    M_max2  = t.constant(100.0)

    # Expected value of outcome
    m =  sampleFromSalpeter(N2, alpha2, M_min2, M_max2)

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=m, sd=10, observed=Masses)

#%% Sample
with basic_model:
    step = pm.Metropolis()
    trace = pm.sample(10000, step=step)

#%% Plot posteriors
_ = pm.traceplot(trace)
pm.summary(trace)

I got an answer on the Discourse of PyMC . I was indeed pretty close since I just had to change the last line of the function definition to return [numpy.array(Masses)].

Now, it seems that this code will take about 300h to run (instead of 2 sec in the original implementation without PyMC), and I was wondering if anyone had any idea why it is so much slower ? I actually had tried really simple distributions earlier and using as_op, and noticed a considerable slowdown but not even by that much. I guess the original code works directly on the logp, is it what makes the whole difference?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM