简体   繁体   English

如何在__main__(python)中定义的pickle函数/类

[英]How to pickle functions/classes defined in __main__ (python)

I would like to be able to pickle a function or class from within __main__, with the obvious problem (mentioned in other posts) that the pickled function/class is in the __main__ namespace and unpickling in another script/module will fail. 我希望能够在__main__中挑选一个函数或类,其中明显的问题(在其他帖子中提到),pickle函数/类在__main__命名空间中,而在另一个脚本/模块中解开将失败。

I have the following solution which works, is there a reason this should not be done? 我有以下解决方案可行,是否有理由不应该这样做?

The following is in myscript.py: 以下是myscript.py:

import myscript
import pickle

if __name__ == "__main__":               

    print pickle.dumps(myscript.myclass())

else:

    class myclass:
        pass

edit : The unpickling would be done in a script/module that has access to myscript.py and can do an import myscript . 编辑 :unpickling将在一个脚本/模块中完成,该脚本/模块可以访问 myscript.py并可以执行import myscript The aim is to use a solution like parallel python to call functions remotely, and be able to write a short, standalone script that contains the functions/classes that can be accessed remotely. 目的是使用类似并行python的解决方案远程调用函数,并能够编写一个包含可远程访问的函数/类的简短独立脚本。

You can get a better handle on global objects by importing __main__ , and using the methods available in that module. 通过导入__main__并使用该模块中可用的方法,您可以更好地处理全局对象。 This is what dill does in order to serialize almost anything in python. 这就是dill为了在python中序列化几乎所有东西而做的事情。 Basically, when dill serializes an interactively defined function, it uses some name mangling on __main__ on both the serialization and deserialization side that makes __main__ a valid module. 基本上,当dill序列化交互式定义的函数时,它在序列化和反序列化方面使用__main__上的一些名称修改,使__main__成为有效的模块。

>>> import dill
>>> 
>>> def bar(x):
...   return foo(x) + x
... 
>>> def foo(x):
...   return x**2
... 
>>> bar(3)
12
>>> 
>>> _bar = dill.loads(dill.dumps(bar))
>>> _bar(3)
12

Actually, dill registers it's types into the pickle registry, so if you have some black box code that uses pickle and you can't really edit it, then just importing dill can magically make it work without monkeypatching the 3rd party code. 实际上,dill将它的类型注册到pickle注册表中,所以如果你有一些黑盒子代码使用pickle并且你无法真正编辑它,那么只需导入dill就可以神奇地使它工作而不用monkeypatching第三方代码。

Or, if you want the whole interpreter session sent over as an "python image", dill can do that too. 或者,如果您希望整个解释器会话作为“python图像”发送,莳萝也可以这样做。

>>> # continuing from above
>>> dill.dump_session('foobar.pkl')
>>>
>>> ^D
dude@sakurai>$ python
Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
[GCC 4.2.1 (Apple Inc. build 5566)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('foobar.pkl')
>>> _bar(3)
12

You can easily send the image across ssh to another computer, and start where you left off there as long as there's version compatibility of pickle and the usual caveats about python changing and things being installed. 您可以轻松地将图像通过ssh发送到另一台计算机,并从那里开始,只要有pickle的版本兼容性以及有关python更改和正在安装的内容的常见警告。

I actually use dill to serialize objects and send them across parallel resources with parallel python , multiprocessing, and mpi4py . 我实际上使用dill来序列化对象并通过并行python ,多处理和mpi4py将它们发送到并行资源。 I roll these up conveniently into the pathos package (and pyina for MPI), which provides a uniform map interface for different parallel batch processing backends. 我摇这些起来方便进入感伤包(和pyina为MPI),它提供了一个均匀的map为不同的并行批处理后端接口。

>>> # continued from above
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> Pool(4).map(foo, range(10))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>>
>>> from pyina.launchers import MpiPool
>>> MpiPool(4).map(foo, range(10))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

There are also non-blocking and iterative maps as well as non-parallel pipe connections. 还有非阻塞和迭代映射以及非并行管道连接。 I also have a pathos module for pp , however, it is somewhat unstable for functions defined in __main__ . 我也有一个pp的pathos模块,但是,它对于__main__定义的函数有点不稳定。 I'm working on improving that. 我正在努力改善这一点。 If you like, fork the code on github and help make the pp better for functions defined in __main__ . 如果你愿意,可以在github上分叉代码,并帮助使pp更好地__main__定义的函数。 The reason pp doesn't pickle well is that pp does it's serialization tricks through using temporary file objects and reading the interpreter session's history... so it doesn't serialize objects in the same way that multiprocessing or mpi4py do. pp没有好好理解的原因是pp通过使用临时文件对象并读取解释器会话的历史来进行序列化操作......所以它不会像多处理或mpi4py那样序列化对象。 I have a dill module dill.source that seamlessly does the same type of pickling that pp uses, but it's rather new. 我有一个dill模块dill.source ,可以无缝地完成pp使用的相同类型的酸洗,但它相当新。

If you are trying to pickle something so that you can use it somewhere else, separate from test_script , that's not going to work, because pickle (apparently) just tries to load the function from the module. 如果你试图腌制某些东西,以便你可以在其他地方使用它,与test_script分开,这是行不通的,因为pickle(显然)只是试图从模块加载函数。 Here's an example: 这是一个例子:

test_script.py test_script.py

def my_awesome_function(x, y, z):
    return x + y + z

picklescript.py picklescript.py

import pickle
import test_script
with open("awesome.pickle", "wb") as f:
    pickle.dump(test_script.my_awesome_function, f)

If you run python picklescript.py , then change the filename of test_script , when you try to load the function, it will fail. 如果您运行python picklescript.py ,然后更改的文件名test_script ,当您尝试加载功能,它会失败。 eg 例如

Running this: 运行这个:

import pickle
with open("awesome.pickle", "rb") as f:
    pickle.load(f)

Will give you the following traceback: 会给你以下追溯:

Traceback (most recent call last):
  File "load_pickle.py", line 3, in <module>
    pickle.load(f)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/pickle.py", line 1124, in find_class
    __import__(module)
ImportError: No module named test_script

Pickle seems to look at the main scope for definitions of classes and functions. Pickle似乎在查看类和函数定义的主要范围。 From inside the module you're unpickling from, try this: 从你正在取消模块的内部,试试这个:

import myscript
import __main__
__main__.myclass = myscript.myclass
#unpickle anywhere after this

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM