[英]Using multiprocessing in emcee library inside a class
I have tried to use emcee library to implement Monte Carlo Markov Chain inside a class and also make multiprocessing module works but after running such a test code: 我尝试使用emcee库在一个类中实现Monte Carlo Markov Chain,并且还使多处理模块工作,但是在运行了这样的测试代码之后:
import numpy as np
import emcee
import scipy.optimize as op
# Choose the "true" parameters.
m_true = -0.9594
b_true = 4.294
f_true = 0.534
# Generate some synthetic data from the model.
N = 50
x = np.sort(10*np.random.rand(N))
yerr = 0.1+0.5*np.random.rand(N)
y = m_true*x+b_true
y += np.abs(f_true*y) * np.random.randn(N)
y += yerr * np.random.randn(N)
class modelfit():
def __init__(self):
self.x=x
self.y=y
self.yerr=yerr
self.m=-0.6
self.b=2.0
self.f=0.9
def get_results(self):
def func(a):
model=a[0]*self.x+a[1]
inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*a[2]))
return 0.5*(np.sum((self.y-model)**2*inv_sigma2 + np.log(inv_sigma2)))
result = op.minimize(func, [self.m, self.b, np.log(self.f)],options={'gtol': 1e-6, 'disp': True})
m_ml, b_ml, lnf_ml = result["x"]
return result["x"]
def lnprior(self,theta):
m, b, lnf = theta
if -5.0 < m < 0.5 and 0.0 < b < 10.0 and -10.0 < lnf < 1.0:
return 0.0
return -np.inf
def lnprob(self,theta):
lp = self.lnprior(theta)
likelihood=self.lnlike(theta)
if not np.isfinite(lp):
return -np.inf
return lp + likelihood
def lnlike(self,theta):
m, b, lnf = theta
model = m * self.x + b
inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*lnf))
return -0.5*(np.sum((self.y-model)**2*inv_sigma2 - np.log(inv_sigma2)))
def run_mcmc(self,nstep):
ndim, nwalkers = 3, 100
pos = [self.get_results() + 1e-4*np.random.randn(ndim) for i in range(nwalkers)]
self.sampler = emcee.EnsembleSampler(nwalkers, ndim, self.lnprob,threads=10)
self.sampler.run_mcmc(pos, nstep)
test=modelfit()
test.x=x
test.y=y
test.yerr=yerr
test.get_results()
test.run_mcmc(5000)
I got this error message : 我收到此错误消息:
File "MCMC_model.py", line 157, in run_mcmc
self.sampler.run_mcmc(theta0, nstep)
File "build/bdist.linux-x86_64/egg/emcee/sampler.py", line 157, in run_mcmc
File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 198, in sample
File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 382, in _get_lnprob
File "build/bdist.linux-x86_64/egg/emcee/interruptible_pool.py", line 94, in map
File "/vol/aibn84/data2/zahra/anaconda/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
I reckon it has something to do with how I have used multiprocessing in the class but I could not figure out how I could keep the structure of my class the way it is and meanwhile use multiprocessing as well??!! 我认为这与我在类中使用多重处理的方式有关,但是我无法弄清楚如何保持类的结构,同时也使用多重处理?
I will appreciate for any tips. 任何提示,我将不胜感激。
PS I have to mention the code works perfectly if I remove threads=10
from the last function. 附注:如果我从上一个函数中删除threads=10
,则代码必须能完美工作。
There are a number of SO questions that discuss what's going on: 有许多SO问题讨论正在发生的事情:
https://stackoverflow.com/a/21345273/2379433 https://stackoverflow.com/a/21345273/2379433
https://stackoverflow.com/a/28887474/2379433 https://stackoverflow.com/a/28887474/2379433
https://stackoverflow.com/a/21345308/2379433 https://stackoverflow.com/a/21345308/2379433
https://stackoverflow.com/a/29129084/2379433 https://stackoverflow.com/a/29129084/2379433
…including this one, which seems to be your response… to nearly the same question: …包括这个,这似乎是您对几乎相同的问题的回答:
However, the difference here is that you are not using multiprocessing
directly -- but emcee
is. 但是,这里的区别在于您不是直接使用multiprocessing
,而是emcee
。 Therefore, the pathos.multiprocessing
solution (from the links above) is not available for you. 因此, pathos.multiprocessing
解决方案(来自上面的链接)不适用于您。 Since emcee
uses cPickle
, you'll have to stick to things that pickle
knows how to serialize. 由于emcee
使用cPickle
,因此您必须坚持pickle
知道如何序列化的内容。 You are out of luck for class instances. 您对类实例不走运。 Typical workarounds are to either use copy_reg
to register the type of object you want to serialize, or to add a __reduce__
method to tell python how to serialize it. 典型的解决方法是使用copy_reg
来注册要序列化的对象的类型,或者添加__reduce__
方法来告诉python如何对其进行序列化。 You can see several of the answers from the above links suggest similar things… but none enable you to keep the class the way you have written it. 您可以从上面的链接中看到一些答案,它们暗示了类似的事情……但是,没有一个使您能够按照编写该类的方式进行操作。
For the record, you can now create a pathos.multiprocessing
pool, and pass it to emcee using the pool
argument. 作为记录,您现在可以创建一个pathos.multiprocessing
池,并使用pool
参数将其传递给emcee。 However, be aware that the overhead of multiprocessing can actually slow things down, unless your likelihood is particularly time-consuming to compute. 但是,请注意,除非您的计算时间特别耗时,否则多处理的开销实际上会降低速度。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.