简体   繁体   English

用pickle或dill序列化__main__中的对象

[英]Serializing an object in __main__ with pickle or dill

I have a pickling problem.我有酸洗问题。 I want to serialize a function in my main script, then load it and run it in another script.我想在我的主脚本中序列化一个函数,然后加载它并在另一个脚本中运行它。 To demonstrate this, I've made 2 scripts:为了证明这一点,我制作了 2 个脚本:

Attempt 1: The naive way:尝试 1:天真的方法:

dill_pickle_script_1.py dill_pickle_script_1.py

import pickle
import time

def my_func(a, b):
    time.sleep(0.1)  # The purpose of this will become evident at the end
    return a+b

if __name__ == '__main__':
    with open('testfile.pkl', 'wb') as f:
        pickle.dump(my_func, f)

dill_pickle_script_2.py dill_pickle_script_2.py

import pickle

if __name__ == '__main__':
    with open('testfile.pkl') as f:
        func = pickle.load(f)
        assert func(1, 2)==3

Problem : when I run script 2, I get AttributeError: 'module' object has no attribute 'my_func' .问题:当我运行脚本 2 时,我得到AttributeError: 'module' object has no attribute 'my_func' I understand why: because when my_func is serialized in script1, it belongs to the __main__ module.我明白为什么:因为当 my_func 在 script1 中被序列化时,它属于__main__模块。 dill_pickle_script_2 can't know that __main__ there referred to the namespace of dill_pickle_script_1, and therefore cannot find the reference. dill_pickle_script_2 不知道那里的__main__引用了 dill_pickle_script_1 的命名空间,因此找不到引用。

Attempt 2: Inserting an absolute import尝试 2:插入绝对导入

I fix the problem by adding a little hack - I add an absolute import to my_func in dill_pickle_script_1 before pickling it.我通过添加一个小技巧来解决这个问题 - 在腌制之前,我在 dill_pickle_script_1 中添加了一个对 my_func 的绝对导入。

dill_pickle_script_1.py dill_pickle_script_1.py

import pickle
import time

def my_func(a, b):
    time.sleep(0.1)
    return a+b

if __name__ == '__main__':
    from dill_pickle_script_1 import my_func  # Added absolute import
    with open('testfile.pkl', 'wb') as f:
        pickle.dump(my_func, f)

Now it works!现在它起作用了! However, I'd like to avoid having to do this hack every time I want to do this.但是,我想避免每次想要这样做时都必须这样做。 (Also, I want to have my pickling be done inside some other module which wouldn't have know which module that my_func came from). (另外,我想让我的酸洗在其他一些不知道 my_func 来自哪个模块的模块中完成)。

Attempt 3: Dill尝试 3:莳萝

I head that the package dill lets you serialize things in main and load them elsewhere.我认为包dill允许您在 main 中序列化内容并将它们加载到其他地方。 So I tried that:所以我试过了:

dill_pickle_script_1.py dill_pickle_script_1.py

import dill
import time

def my_func(a, b):
    time.sleep(0.1)
    return a+b

if __name__ == '__main__':
    with open('testfile.pkl', 'wb') as f:
        dill.dump(my_func, f)

dill_pickle_script_2.py dill_pickle_script_2.py

import dill

if __name__ == '__main__':
    with open('testfile.pkl') as f:
        func = dill.load(f)
        assert func(1, 2)==3

Now, however, I have another problem: When running dill_pickle_script_2.py , I get a NameError: global name 'time' is not defined .但是,现在我遇到了另一个问题:运行dill_pickle_script_2.py ,出现NameError: global name 'time' is not defined It seems that dill did not realize that my_func referenced the time module and has to import it on load.似乎 dill 没有意识到 my_func 引用了time模块并且必须在加载时导入它。

My Question?我的问题?

How can I serialize an object in main, and load it again in another script so that all the imports used by that object are also loaded, without doing the nasty little hack in Attempt 2?如何在 main 中序列化一个对象,并在另一个脚本中再次加载它,以便该对象使用的所有导入也被加载,而无需在尝试 2 中进行令人讨厌的小技巧?

Well, I found a solution.嗯,我找到了解决办法。 It is a horrible but tidy kludge and not guaranteed to work in all cases.这是一个可怕但整洁的杂物,并不能保证在所有情况下都有效。 Any suggestions for improvement are welcome.欢迎提出任何改进建议。 The solution involves replacing the main reference with an absolute module reference in the pickle string, using the following helper functions:解决方案包括使用 pickle 字符串中的绝对模块引用替换主引用,使用以下帮助函数:

import sys
import os

def pickle_dumps_without_main_refs(obj):
    """
    Yeah this is horrible, but it allows you to pickle an object in the main module so that it can be reloaded in another
    module.
    :param obj:
    :return:
    """
    currently_run_file = sys.argv[0]
    module_path = file_path_to_absolute_module(currently_run_file)
    pickle_str = pickle.dumps(obj, protocol=0)
    pickle_str = pickle_str.replace('__main__', module_path)  # Hack!
    return pickle_str


def pickle_dump_without_main_refs(obj, file_obj):
    string = pickle_dumps_without_main_refs(obj)
    file_obj.write(string)


def file_path_to_absolute_module(file_path):
    """
    Given a file path, return an import path.
    :param file_path: A file path.
    :return:
    """
    assert os.path.exists(file_path)
    file_loc, ext = os.path.splitext(file_path)
    assert ext in ('.py', '.pyc')
    directory, module = os.path.split(file_loc)
    module_path = [module]
    while True:
        if os.path.exists(os.path.join(directory, '__init__.py')):
            directory, package = os.path.split(directory)
            module_path.append(package)
        else:
            break
    path = '.'.join(module_path[::-1])
    return path

Now, I can simply change dill_pickle_script_1.py to say现在,我可以简单地更改dill_pickle_script_1.py

import time
from artemis.remote.child_processes import pickle_dump_without_main_refs


def my_func(a, b):
    time.sleep(0.1)
    return a+b

if __name__ == '__main__':
    with open('testfile.pkl', 'wb') as f:
        pickle_dump_without_main_refs(my_func, f)

And then dill_pickle_script_2.py works!然后dill_pickle_script_2.py工作!

You can use dill.dump with recurse=True or dill.settings["recurse"] = True .您可以将dill.dumprecurse=Truedill.settings["recurse"] = True It will capture closures:它将捕获闭包:

In file A:在文件 A 中:

import time
import dill

def my_func(a, b):
  time.sleep(0.1)
  return a + b

with open("tmp.pkl", "wb") as f:
  dill.dump(my_func, f, recurse=True)

In file B:在文件 B 中:

import dill

with open("tmp.pkl", "rb") as f:
  my_func = dill.load(f)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM