简体   繁体   English

Python 更改模块目录后的酸洗

[英]Python pickling after changing a module's directory

I've recently changed my program's directory layout: before, I had all my modules inside the "main" folder.我最近更改了程序的目录布局:以前,我将所有模块都放在“主”文件夹中。 Now, I've moved them into a directory named after the program, and placed an __init__.py there to make a package.现在,我将它们移动到以程序命名的目录中,并在其中放置了一个__init__.py以创建 package。

Now I have a single.py file in my main directory that is used to launch my program, which is much neater.现在我的主目录中有一个single.py 文件,用于启动我的程序,它更加整洁。

Anyway, trying to load in pickled files from previous versions of my program is failing.无论如何,尝试从我的程序的早期版本加载腌制文件失败了。 I'm getting, "ImportError: No module named tools" - which I guess is because my module was previously in the main folder, and now it's in whyteboard.tools, not simply plain tools.我得到了“ImportError:没有名为工具的模块”——我猜这是因为我的模块以前在主文件夹中,现在它在 whyteboard.tools 中,而不仅仅是简单的工具。 However, the code that is importing in the tools module lives in the same directory as it, so I doubt there's a need to specify a package.但是,在工具模块中导入的代码与其位于同一目录中,因此我怀疑是否需要指定 package。

So, my program directory looks something like this:所以,我的程序目录看起来像这样:

whyteboard-0.39.4

-->whyteboard.py

-->README.txt

-->CHANGELOG.txt

---->whyteboard/

---->whyteboard/__init__.py

---->whyteboard/gui.py

---->whyteboard/tools.py

whyteboard.py launches a block of code from whyteboard/gui.py, that fires up the GUI. whyteboard.py 从 whyteboard/gui.py 启动一段代码,启动 GUI。 This pickling problem definitely wasn't happening before the directory re-organizing.在重新组织目录之前,绝对不会发生这种酸洗问题。

As pickle's docs say, in order to save and restore a class instance (actually a function, too), you must respect certain constraints: 正如pickle的文档所说,为了保存和恢复类实例(实际上也是一个函数),你必须尊重某些约束:

pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored pickle可以透明地保存和恢复类实例,但是类定义必须是可导入的,并且存在于与存储对象时相同的模块中

whyteboard.tools is not the "the same module as" tools (even though it can be imported by import tools by other modules in the same package, it ends up in sys.modules as sys.modules['whyteboard.tools'] : this is absolutely crucial, otherwise the same module imported by one in the same package vs one in another package would end up with multiple and possibly conflicting entries!). whyteboard.tools 不是 “与模块相同的” tools (即使它可以由同一个包中的其他模块通过import tools ,它最终在sys.modules作为sys.modules['whyteboard.tools'] :这绝对是至关重要的,否则在同一个包中导入的同一个模块与另一个包中的一个模块导入的模块最终将会出现多个且可能存在冲突的条目!)。

If your pickle files are in a good/advanced format (as opposed to the old ascii format that's the default only for compatibility reasons), migrating them once you perform such changes may in fact not be quite as trivial as "editing the file" (which is binary &c...!), despite what another answer suggests. 如果您的咸菜文件是一个好/高级格式(相对于旧的ASCII格式是默认仅出于兼容性的原因),迁移过一次执行这样的变化可能实际上并不像“编辑文件”小巫见大巫(这是二元&c ...!),尽管另一个答案表明。 I suggest that, instead, you make a little "pickle-migrating script": let it patch sys.modules like this...: 我建议你做一点“pickle-migrating script”:让它修补这样的sys.modules ......:

import sys
from whyteboard import tools

sys.modules['tools'] = tools

and then cPickle.load each file, del sys.modules['tools'] , and cPickle.dump each loaded object back to file: that temporary extra entry in sys.modules should let the pickles load successfully, then dumping them again should be using the right module-name for the instances' classes (removing that extra entry should make sure of that). 然后cPickle.load每个文件, del sys.modules['tools']cPickle.dump每个加载的对象返回到文件: sys.modules中的临时额外条目应该让pickle成功加载,然后再次转储它们应该是为实例的类使用正确的模块名称(删除该额外的条目应该确保这一点)。

Happened to me, solved it by adding the new location of the module to sys.path before loading pickle: 发生在我身上,通过在加载pickle之前将模块的新位置添加到sys.path来解决它:

import sys
sys.path.append('path/to/whiteboard')
f = open("pickled_file", "rb")
pickle.load(f)

This can be done with a custom "unpickler" that uses find_class() : 这可以使用find_class()的自定义“unpickler”来完成:

import io
import pickle


class RenameUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        renamed_module = module
        if module == "tools":
            renamed_module = "whyteboard.tools"

        return super(RenameUnpickler, self).find_class(renamed_module, name)


def renamed_load(file_obj):
    return RenameUnpickler(file_obj).load()


def renamed_loads(pickled_bytes):
    file_obj = io.BytesIO(pickled_bytes)
    return renamed_load(file_obj)

Then you'd need to use renamed_load() instead of pickle.load() and renamed_loads() instead of pickle.loads() . 然后你需要使用renamed_load()而不是pickle.load()renamed_loads()而不是pickle.loads()

pickle serializes classes by reference, so if you change were the class lives, it will not unpickle because the class will not be found. pickle通过引用序列化类,所以如果你改变了类的生命,它将不会解开因为找不到类。 If you use dill instead of pickle , then you can serialize classes by reference or directly (by directly serializing the class instead of it's import path). 如果你使用dill而不是pickle ,那么你可以通过引用或直接序列化类(通过直接序列化类而不是它的导入路径)。 You simulate this pretty easily by just changing the class definition after a dump and before a load . 只需在dumpload之前更改类定义,即可轻松模拟这一点。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> 
>>> class Foo(object):
...   def bar(self):
...     return 5
... 
>>> f = Foo()
>>> 
>>> _f = dill.dumps(f)
>>> 
>>> class Foo(object):
...   def bar(self, x):
...     return x
... 
>>> g = Foo()
>>> f_ = dill.loads(_f)
>>> f_.bar()
5
>>> g.bar(4)
4

This is the normal behavior of pickle, unpickled objects need to have their defining module importable . 这是pickle的正常行为,unpickled对象需要将其定义模块导入

You should be able to change the modules path (ie from tools to whyteboard.tools ) by editing the pickled files, as they are normally simple text files. 您应该能够通过编辑pickle文件来更改模块路径(即从toolswhyteboard.tools ),因为它们通常是简单的文本文件。

When you try to load a pickle file that contain a class reference, you must respect the same structure when you saved the pickle.当您尝试加载包含类引用的泡菜文件时,您必须遵守保存泡菜时相同的结构。 If you want use the pickle somewhere else, you have to tell where this class or other object is;如果你想在其他地方使用泡菜,你必须告诉这个类或其他对象在哪里; so do this below you can save the day:所以在下面这样做你可以节省一天:

import sys
sys.path.append('path/to/folder containing the python module')

For people like me needing to update lots of pickle dumps, here's a function implementing @Alex Martelli's excellent advice:对于像我这样需要更新大量泡菜转储的人,这里有一个 function 实现@Alex Martelli 的极好建议:

import sys
from types import ModuleType
import pickle

# import torch

def update_module_path_in_pickled_object(
    pickle_path: str, old_module_path: str, new_module: ModuleType
) -> None:
    """Update a python module's dotted path in a pickle dump if the
    corresponding file was renamed.

    Implements the advice in https://stackoverflow.com/a/2121918.

    Args:
        pickle_path (str): Path to the pickled object.
        old_module_path (str): The old.dotted.path.to.renamed.module.
        new_module (ModuleType): from new.location import module.
    """
    sys.modules[old_module_path] = new_module

    dic = pickle.load(open(pickle_path, "rb"))
    # dic = torch.load(pickle_path, map_location="cpu")

    del sys.modules[old_module_path]

    pickle.dump(dic, open(pickle_path, "wb"))
    # torch.save(dic, pickle_path)

In my case, the dumps were PyTorch model checkpoints.就我而言,转储是 PyTorch model 检查点。 Hence the commented-out torch.load/save() .因此注释掉的torch.load/save()

Example例子

from new.location import new_module

for pickle_path in ('foo.pkl', 'bar.pkl'):
    update_module_path_in_pickled_object(
        pickle_path, "old.module.dotted.path", new_module
    )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM