在类内的emcee库中使用多处理

Question

I have tried to use emcee library to implement Monte Carlo Markov Chain inside a class and also make multiprocessing module works but after running such a test code: 我尝试使用emcee库在一个类中实现Monte Carlo Markov Chain，并且还使多处理模块工作，但是在运行了这样的测试代码之后：

import numpy as np
import emcee
import scipy.optimize as op
# Choose the "true" parameters.
m_true = -0.9594
b_true = 4.294
f_true = 0.534

# Generate some synthetic data from the model.
N = 50
x = np.sort(10*np.random.rand(N))
yerr = 0.1+0.5*np.random.rand(N)
y = m_true*x+b_true
y += np.abs(f_true*y) * np.random.randn(N)
y += yerr * np.random.randn(N)

class modelfit():
      def  __init__(self):
          self.x=x
          self.y=y
          self.yerr=yerr
          self.m=-0.6
          self.b=2.0
          self.f=0.9
      def get_results(self):
          def func(a):
              model=a[0]*self.x+a[1]
              inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*a[2]))
              return 0.5*(np.sum((self.y-model)**2*inv_sigma2 + np.log(inv_sigma2)))
          result = op.minimize(func, [self.m, self.b, np.log(self.f)],options={'gtol': 1e-6, 'disp': True})
          m_ml, b_ml, lnf_ml = result["x"]
          return result["x"]
      def lnprior(self,theta):
          m, b, lnf = theta
          if -5.0 < m < 0.5 and 0.0 < b < 10.0 and -10.0 < lnf < 1.0:
             return 0.0
          return -np.inf
      def lnprob(self,theta):
          lp = self.lnprior(theta)
          likelihood=self.lnlike(theta)
          if not np.isfinite(lp):
             return -np.inf
          return lp + likelihood
      def lnlike(self,theta):
          m, b, lnf = theta
          model = m * self.x + b
          inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*lnf))
          return -0.5*(np.sum((self.y-model)**2*inv_sigma2 - np.log(inv_sigma2)))
      def run_mcmc(self,nstep):
          ndim, nwalkers = 3, 100
          pos = [self.get_results() + 1e-4*np.random.randn(ndim) for i in range(nwalkers)]
          self.sampler = emcee.EnsembleSampler(nwalkers, ndim, self.lnprob,threads=10)
          self.sampler.run_mcmc(pos, nstep)
test=modelfit()
test.x=x
test.y=y
test.yerr=yerr
test.get_results()
test.run_mcmc(5000)

I got this error message : 我收到此错误消息：

File "MCMC_model.py", line 157, in run_mcmc
    self.sampler.run_mcmc(theta0, nstep)
  File "build/bdist.linux-x86_64/egg/emcee/sampler.py", line 157, in run_mcmc
  File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 198, in sample
  File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 382, in _get_lnprob
  File "build/bdist.linux-x86_64/egg/emcee/interruptible_pool.py", line 94, in map
  File "/vol/aibn84/data2/zahra/anaconda/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

I reckon it has something to do with how I have used multiprocessing in the class but I could not figure out how I could keep the structure of my class the way it is and meanwhile use multiprocessing as well??!! 我认为这与我在类中使用多重处理的方式有关，但是我无法弄清楚如何保持类的结构，同时也使用多重处理？

I will appreciate for any tips. 任何提示，我将不胜感激。

PS I have to mention the code works perfectly if I remove threads=10 from the last function. 附注：如果我从上一个函数中删除threads=10 ，则代码必须能完美工作。

Answer 1

There are a number of SO questions that discuss what's going on: 有许多SO问题讨论正在发生的事情：

…including this one, which seems to be your response… to nearly the same question: …包括这个，这似乎是您对几乎相同的问题的回答：

https://stackoverflow.com/a/25388586/2379433 https://stackoverflow.com/a/25388586/2379433

However, the difference here is that you are not using multiprocessing directly -- but emcee is. 但是，这里的区别在于您不是直接使用multiprocessing ，而是emcee 。 Therefore, the pathos.multiprocessing solution (from the links above) is not available for you. 因此， pathos.multiprocessing解决方案（来自上面的链接）不适用于您。 Since emcee uses cPickle , you'll have to stick to things that pickle knows how to serialize. 由于emcee使用cPickle ，因此您必须坚持pickle知道如何序列化的内容。 You are out of luck for class instances. 您对类实例不走运。 Typical workarounds are to either use copy_reg to register the type of object you want to serialize, or to add a __reduce__ method to tell python how to serialize it. 典型的解决方法是使用copy_reg来注册要序列化的对象的类型，或者添加__reduce__方法来告诉python如何对其进行序列化。 You can see several of the answers from the above links suggest similar things… but none enable you to keep the class the way you have written it. 您可以从上面的链接中看到一些答案，它们暗示了类似的事情……但是，没有一个使您能够按照编写该类的方式进行操作。

Answer 2

For the record, you can now create a pathos.multiprocessing pool, and pass it to emcee using the pool argument. 作为记录，您现在可以创建一个pathos.multiprocessing池，并使用pool参数将其传递给emcee。 However, be aware that the overhead of multiprocessing can actually slow things down, unless your likelihood is particularly time-consuming to compute. 但是，请注意，除非您的计算时间特别耗时，否则多处理的开销实际上会降低速度。

在类内的emcee库中使用多处理

问题描述

2 个解决方案

解决方案1
1 2015-03-29 12:24:09

解决方案2
1 2016-01-17 20:53:51

在类内的emcee库中使用多处理

问题描述

2 个解决方案

解决方案1 1 2015-03-29 12:24:09

解决方案2 1 2016-01-17 20:53:51

解决方案1
1 2015-03-29 12:24:09

解决方案2
1 2016-01-17 20:53:51