简体   繁体   English

我可以在用 ctypes 包装的函数上使用 dask.delayed 吗?

[英]Can I use dask.delayed on a function wrapped with ctypes?

The goal is to use dask.delayed to parallelize some 'embarrassingly parallel' sections of my code.目标是使用dask.delayed来并行化我的代码的一些“令人尴尬地并行”的部分。 The code involves calling a python function which wraps a c-function using ctypes .该代码涉及调用一个 python 函数,该函数使用ctypes包装一个 c 函数。 To understand the errors I was getting I wrote a very basic example.为了理解我遇到的错误,我写了一个非常基本的例子。

The c-function: c函数:

double zippy_sum(double x, double y)
{
return x + y;
}

The python:蟒蛇:

from dask.distributed import Client
client = Client(n_workers = 4)
client

import os
import dask
import ctypes

current_dir = os.getcwd() #os.path.abspath(os.path.dirname(__file__))
_mod = ctypes.cdll.LoadLibrary(os.path.join(current_dir, "zippy.so"))

_zippy_sum = _mod.zippy_sum
_zippy_sum.argtypes = [ctypes.c_double, ctypes.c_double]
_zippy_sum.restype = ctypes.c_double

def zippy(x, y):

    z = _zippy_sum(x, y)

    return z

result = dask.delayed(zippy)(1., 2.)
result.compute()

The Traceback:追溯:

--------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/distributed/worker.py in dumps_function(func) 3286 with _cache_lock: -> 3287 result = cache_dumps[func] 3288 except KeyError: -------------------------------------------------- ------------------------- KeyError Traceback(最近一次调用最后一次)~/.edm/envs/evaxi3.6/lib/python3.6/带有 _cache_lock 的 dumps_function(func) 3286 中的 site-packages/distributed/worker.py:-> 3287 result = cache_dumps[func] 3288 除了 KeyError:

~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/distributed/utils.py in getitem (self, key) 1517 def getitem (self, key): -> 1518 value = super(). ~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/distributed/utils.py in getitem (self, key) 1517 def getitem (self, key): -> 1518 value = super() . getitem (key) 1519 self.data.move_to_end(key) getitem (key) 1519 self.data.move_to_end(key)

~/.edm/envs/evaxi3.6/lib/python3.6/collections/ init .py in getitem (self, key) 990 return self. ~/.edm/envs/evaxi3.6/lib/python3.6/collections/ init .py in getitem (self, key) 990 return self. class . missing (self, key) --> 991 raise KeyError(key) 992 def setitem (self, key, item): self.data[key] = item丢失(self,key)--> 991 raise KeyError (key)992 def setitem (self,key,item):self.data[key] = item

KeyError: function zippy at 0x11ffc50d0 KeyError:函数 zippy 在 0x11ffc50d0

During handling of the above exception, another exception occurred:在处理上述异常的过程中,又发生了一个异常:

ValueError Traceback (most recent call last) ~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/distributed/protocol/pickle.py in dumps(x) 40 if b" main " in result: ---> 41 return cloudpickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL) 42 else: ValueError Traceback(最近一次调用最后一次)~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/distributed/protocol/pickle.py in dumps(x) 40 if b" main " in result: ---> 41 返回 cloudpickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL) 42 否则:

~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in dumps(obj, protocol) 1147 cp = CloudPickler(file, protocol=protocol) -> 1148 cp.dump(obj) 1149 return file.getvalue() ~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in dumps(obj, protocol) 1147 cp = CloudPickler(file, protocol=protocol) -> 1148 cp.dump (obj) 1149 返回 file.getvalue()

~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in dump(self, obj) 490 try: --> 491 return Pickler.dump(self, obj) 492 except RuntimeError as e: ~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in dump(self, obj) 490 try: --> 491 return Pickler.dump(self, obj) 492除了 RuntimeError 为 e:

~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in dump(self, obj) 408 self.framer.start_framing() --> 409 self.save(obj) 410 self.write(STOP) ~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in dump(self, obj) 408 self.framer.start_framing() --> 409 self.save(obj) 410 self.write(停止)

~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save(self, obj, save_persistent_id) 475 if f is not None: --> 476 f(self, obj) # Call unbound method with explicit self 477 return ~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save(self, obj, save_persistent_id) 475 if f is not None: --> 476 f(self, obj) #调用未绑定的方法显式自我 477 返回

~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_function(self, obj, name) 565 else: --> 566 return self.save_function_tuple(obj) 567 ~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_function(self, obj, name) 565 else: --> 566 return self.save_function_tuple(obj) 567

~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_function_tuple(self, func) 779 state['kwdefaults'] = func. ~/.edm/envs/evaxi3.6/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_function_tuple(self, func) 779 state['kwdefaults'] = func. kwdefaults --> 780 save(state) 781 write(pickle.TUPLE) kwdefaults --> 780 save(state) 781 write(pickle.TUPLE)

~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save(self, obj, save_persistent_id) 475 if f is not None: --> 476 f(self, obj) # Call unbound method with explicit self 477 return ~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save(self, obj, save_persistent_id) 475 if f is not None: --> 476 f(self, obj) #调用未绑定的方法显式自我 477 返回

~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save_dict(self, obj) 820 self.memoize(obj) --> 821 self._batch_setitems(obj.items()) 822 ~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save_dict(self, obj) 820 self.memoize(obj) --> 821 self._batch_setitems(obj.items()) 822

~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in _batch_setitems(self, items) 846 save(k) --> 847 save(v) 848 write(SETITEMS) ~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in _batch_setitems(self, items) 846 save(k) --> 847 save(v) 848 write(SETITEMS)

~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save(self, obj, save_persistent_id) 475 if f is not None: --> 476 f(self, obj) # Call unbound method with explicit self 477 return ~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save(self, obj, save_persistent_id) 475 if f is not None: --> 476 f(self, obj) #调用未绑定的方法显式自我 477 返回

~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save_dict(self, obj) 820 self.memoize(obj) --> 821 self._batch_setitems(obj.items()) 822 ~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save_dict(self, obj) 820 self.memoize(obj) --> 821 self._batch_setitems(obj.items()) 822

~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in _batch_setitems(self, items) 851 save(k) --> 852 save(v) 853 write(SETITEM) ~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in _batch_setitems(self, items) 851 save(k) --> 852 save(v) 853 write(SETITEM)

~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save(self, obj, save_persistent_id) 495 if reduce is not None: --> 496 rv = reduce(self.proto) 497 else: ~/.edm/envs/evaxi3.6/lib/python3.6/pickle.py in save(self, obj, save_persistent_id) 495 如果reduce不是None:--> 496 rv = reduce(self.proto) 497 else :

ValueError: ctypes objects containing pointers cannot be pickled ValueError:不能腌制包含指针的 ctypes 对象

Unfortunately, I still do not understand the errors!不幸的是,我仍然不明白错误! I am just getting started with dask and only have some basic experience with ctypes .我刚刚开始使用dask并且只有一些基本的ctypes经验。 Does anyone have suggestions for how to tackle this, or even understanding what need to be tackled?有没有人对如何解决这个问题有建议,甚至了解需要解决什么问题?

Thanks!谢谢!

Indeed, you cannot serialise a function referencing a C-function in the closure or the arguments.实际上,您不能序列化在闭包或参数中引用 C 函数的函数。 However, if your function is in a module which is accessible to all workers, then you end up serialising just the module name, and python does the right thing.但是,如果您的函数位于所有工作人员都可以访问的模块中,那么您最终只会序列化模块名称,python 会做正确的事情。

module zippy.py (somewhere on your python PATH, perhaps the current directory for the example):模块 zippy.py (在你的 python PATH 上的某个地方,可能是示例的当前目录):

import os
import dask
import ctypes

current_dir = os.getcwd() #os.path.abspath(os.path.dirname(__file__))
_mod = ctypes.cdll.LoadLibrary(os.path.join(current_dir, "zippy.so"))

_zippy_sum = _mod.zippy_sum
_zippy_sum.argtypes = [ctypes.c_double, ctypes.c_double]
_zippy_sum.restype = ctypes.c_double

def zippy(x, y):

    z = _zippy_sum(x, y)

    return z

main script:主要脚本:

from dask.distributed import Client
import zippy
if __name__ == "__main__":
    # if running as a script, this is helpful
    client = Client(n_workers = 4)

result = dask.delayed(zippy.zippy)(1., 2.)
result.compute()

The other solution, if you don't want to make a module, is to do all your C imports and definitions within the function.另一种解决方案是,如果您不想创建模块,则在函数中执行所有 C 导入和定义。

def zippy(x, y):
    _mod = ctypes.cdll.LoadLibrary(os.path.join(current_dir, "zippy.so"))

    _zippy_sum = _mod.zippy_sum
    _zippy_sum.argtypes = [ctypes.c_double, ctypes.c_double]
    _zippy_sum.restype = ctypes.c_double

    z = _zippy_sum(x, y)

    return z

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM