简体   繁体   English

在joblib`Parallel`上下文中对`matlab`对象进行腌制时出错

[英]Error pickling a `matlab` object in joblib `Parallel` context

I'm running some Matlab code in parallel from inside a Python context (I know, but that's what's going on), and I'm hitting an import error involving matlab.double . 我正在从Python上下文中并行运行一些Matlab代码(我知道,但这就是发生的事情),并且遇到了涉及matlab.double的导入错误。 The same code works fine in a multiprocessing.Pool , so I am having trouble figuring out what the problem is. 相同的代码在multiprocessing.Pool可以正常工作,所以我很难弄清楚问题出在哪里。 Here's a minimal reproducing test case. 这是一个最小的再现测试用例。

import matlab
from multiprocessing import Pool
from joblib import Parallel, delayed

# A global object that I would like to be available in the parallel subroutine
x = matlab.double([[0.0]])

def f(i):
    print(i, x)

with Pool(4) as p:
    p.map(f, range(10))
    # This prints 1, [[0.0]]\n2, [[0.0]]\n... as expected

for _ in Parallel(4, backend='multiprocessing')(delayed(f)(i) for i in range(10)):
    pass
# This also prints 1, [[0.0]]\n2, [[0.0]]\n... as expected

# Now run with default `backend='loky'`
for _ in Parallel(4)(delayed(f)(i) for i in range(10)):
    pass
# ^ this crashes.

So, the only problematic one is the one using the 'loky' backend. 因此,唯一有问题的是使用'loky'后端的。 The full traceback is: 完整的回溯是:

exception calling callback for <Future at 0x7f63b5a57358 state=finished raised BrokenProcessPool>
joblib.externals.loky.process_executor._RemoteTraceback: 
'''
Traceback (most recent call last):
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "~/miniconda3/envs/myenv/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/mlarray.py", line 31, in <module>
    from _internal.mlarray_sequence import _MLArrayMetaClass
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/_internal/mlarray_sequence.py", line 3, in <module>
    from _internal.mlarray_utils import _get_strides, _get_size, \
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/_internal/mlarray_utils.py", line 4, in <module>
    import matlab
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/__init__.py", line 24, in <module>
    from mlarray import double, single, uint8, int8, uint16, \
ImportError: cannot import name 'double'
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
    callback(self)
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 309, in __call__
    self.parallel.dispatch_next()
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 731, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
    self._dispatch(tasks)
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 716, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 510, in apply_async
    future = self._workers.submit(SafeFunction(func))
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/reusable_executor.py", line 151, in submit
    fn, *args, **kwargs)
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 1022, in submit
    raise self._flags.broken
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
joblib.externals.loky.process_executor._RemoteTraceback: 
'''
Traceback (most recent call last):
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "~/miniconda3/envs/myenv/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/mlarray.py", line 31, in <module>
    from _internal.mlarray_sequence import _MLArrayMetaClass
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/_internal/mlarray_sequence.py", line 3, in <module>
    from _internal.mlarray_utils import _get_strides, _get_size, \
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/_internal/mlarray_utils.py", line 4, in <module>
    import matlab
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/__init__.py", line 24, in <module>
    from mlarray import double, single, uint8, int8, uint16, \
ImportError: cannot import name 'double'
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test.py", line 20, in <module>
    for _ in Parallel(4)(delayed(f)(i) for i in range(10)):
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 934, in __call__
    self.retrieve()
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result
    return future.result(timeout=timeout)
  File "~/miniconda3/envs/myenv/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "~/miniconda3/envs/myenv/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
    callback(self)
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 309, in __call__
    self.parallel.dispatch_next()
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 731, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
    self._dispatch(tasks)
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 716, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 510, in apply_async
    future = self._workers.submit(SafeFunction(func))
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/reusable_executor.py", line 151, in submit
    fn, *args, **kwargs)
  File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 1022, in submit
    raise self._flags.broken
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

Looking at the traceback, it seems like the root cause is an issue importing the matlab package in the child process. 查看回溯,似乎根本原因是在子进程中导入matlab软件包时出现问题。

It's probably worth noting that this all runs just fine if instead I had defined x = np.array([[0.0]]) (after importing numpy as np ). 可能值得注意的是,如果我定义了x = np.array([[0.0]]) (在将numpy as np导入numpy as np之后x = np.array([[0.0]]) ,则所有这些都可以正常运行。 And of course the main process has no problem with any matlab imports, so I am not sure why the child process would. 当然,主过程对于任何matlab导入都没有问题,因此我不确定子进程为什么会这样做。

I'm not sure if this error has anything in particular to do with the matlab package, or if it's something to do with global variables and cloudpickle or loky . 我不确定此错误是否与matlab软件包特别相关,还是与全局变量和cloudpickleloky In my application it would help to stick with loky , so I'd appreciate any insight! 在我的应用程序中,坚持使用loky会有所帮助,因此,我将不胜感激!

I should also note that I'm using the official Matlab engine for Python: https://www.mathworks.com/help/matlab/matlab-engine-for-python.html . 我还应注意,我正在使用Python的官方Matlab引擎: https : //www.mathworks.com/help/matlab/matlab-engine-for-python.html I suppose that might make it hard for others to try out the test cases, so I wish I could reproduce this error with a type other than matlab.double , but I haven't found another yet. 我想这可能会使其他人很难尝试测试用例,因此我希望我可以使用除matlab.double其他类型来重现此错误,但我还没有找到其他错误。

Digging around more, I've noticed that the process of importing the matlab package is more circular than I would expect, and I'm speculating that this could be part of the problem? 深入研究,我注意到导入matlab程序包的过程比我预期的要循环得多,而且我推测这可能是问题的一部分? The issue is that when import matlab is run by loky 's _ForkingPickler , first some file matlab/mlarray.py is imported, which imports some other files, one of which contains import matlab , and this causes matlab/__init__.py to be run, which internally has from mlarray import double, single, uint8, ... which is the line that causes the crash. 问题是,当由loky_ForkingPickler运行import matlab时,首先导入了一些文件matlab/mlarray.py ,该文件又导入了一些其他文件,其中一个文件包含import matlab ,这将导致运行matlab/__init__.py ,它在内部from mlarray import double, single, uint8, ...这是导致崩溃的行。

Could this circularity be the issue? 难道这就是问题所在吗? If so, why can I import this module in the main process but not in the loky backend? 如果是这样,为什么我可以在主进程中导入此模块,而不能在loky后端中loky

The error is caused by incorrect loading order of global objects in the child processes. 该错误是由于子进程中全局对象的加载顺序错误引起的。 It can be seen clearly in the traceback _ForkingPickler.loads(res) -> ... -> import matlab -> from mlarray import ... that matlab is not yet imported when the global variable x is loaded by cloudpickle . 它可以清楚地在回溯中可以看出_ForkingPickler.loads(res) -> ... -> import matlab -> from mlarray import ...matlab尚未导入时的全局变量x被加载cloudpickle

joblib with loky seems to treat modules as normal global objects and send them dynamically to the child processes. joblibloky似乎对待模块正常全局对象,并动态将它们发送到子进程。 joblib doesn't record the order in which those objects/modules were defined. joblib不记录定义这些对象/模块的顺序。 Therefore they are loaded (initialized) in a random order in the child processes. 因此,它们在子进程中以随机顺序加载(初始化)。

A simple workaround is to manually pickle the matlab object and load it after importing matlab inside your function. 一个简单的解决方法是在函数中导入matlab之后手动对matlab对象进行腌制并加载它。

import matlab
import pickle

px = pickle.dumps(matlab.double([[0.0]]))

def f(i):
    import matlab
    x=pickle.loads(px)
    print(i, x)

Of course you can also use the joblib.dumps and loads to serialize the objects. 当然,您也可以使用joblib.dumpsloads来序列化对象。

Use initializer 使用初始化

Thanks to the suggestion of @Aaron, you can also use an initializer ( for loky ) to import Matlab before loading x . 由于@Aaron的建议,您还可以在加载x之前使用initializer用于loky )导入Matlab。

Currently there's no simple API to specify initializer . 当前没有简单的API可以指定initializer So I wrote a simple function: 所以我写了一个简单的函数:

def with_initializer(self, f_init):
    # Overwrite initializer hook in the Loky ProcessPoolExecutor
    # https://github.com/tomMoral/loky/blob/f4739e123acb711781e46581d5ed31ed8201c7a9/loky/process_executor.py#L850
    hasattr(self._backend, '_workers') or self.__enter__()
    origin_init = self._backend._workers._initializer
    def new_init():
        origin_init()
        f_init()
    self._backend._workers._initializer = new_init if callable(origin_init) else f_init
    return self

It is a little bit hacky but works well with the current version of joblib and loky. 它有点hacky,但是可以与当前版本的joblib和loky一起很好地工作。 Then you can use it like: 然后您可以像这样使用它:

import matlab
from joblib import Parallel, delayed

x = matlab.double([[0.0]])

def f(i):
    print(i, x)

def _init_matlab():
    import matlab

with Parallel(4) as p:
    for _ in with_initializer(p, _init_matlab)(delayed(f)(i) for i in range(10)):
        pass

I hope the developers of joblib will add initializer argument to the constructor of Parallel in the future. 我希望joblib的开发人员将来可以将initializer参数添加到Parallel的构造函数中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM