读取pickle文件时出现属性错误

Question

I get the following error when I'm reading my .pkl files on spyder (python 3.6.5):在 spyder (python 3.6.5) 上读取 .pkl 文件时出现以下错误：

IN: with open(file, "rb") as f:
       data = pickle.load(f)  

Traceback (most recent call last):

 File "<ipython-input-5-d9796b902b88>", line 2, in <module>
   data = pickle.load(f)

AttributeError: Can't get attribute 'Signal' on <module '__main__' from 'C:\\Python36\\lib\\site-packages\\spyder\\utils\\ipython\\start_kernel.py'>

The context:上下文：

My program is made of one file: program.py In the program, a class Signal is defined as well as many functions.我的程序由一个文件组成： program.py在程序中，定义了一个类Signal以及许多函数。 A simplified overview of the program is provided below:下面提供了该程序的简化概述：

import numpy as np
import _pickle as pickle
import os

# The unique class
class Signal:
    def __init__(self, fq, t0, tf):
        self.fq = fq
        self.t0 = t0
        self.tf = tf
        self.timeline = np.round(np.arange(t0, tf, 1/fq*1000), 3)

# The functions
def write_file(data, folder_path, file_name):
    with open(join(folder_path, file_name), "wb") as output:
        pickle.dump(data, output, -1)

def read_file(folder_path, file_name):
    with open(join(folder_path, file_name), "rb") as input:
        data= pickle.load(input)
    return data

def compute_data(# parameters):
    # do stuff

The function compute_data will return a list of tuples of the form:函数compute_data将返回以下形式的元组列表：

data = [((Signal_1_1, Signal_1_2, ...), val 1), ((Signal_2_1, Signal_2_2, ...), val 2)...]

With, of course, the Signal_i_k being an object Signal .当然， Signal_i_k 是一个对象Signal 。 This list will be saved in .pkl format.此列表将以 .pkl 格式保存。 Moreover, I'm doing a lot of iteration with different parameters for the compute_data functions.此外，我正在为计算compute_data函数使用不同的参数进行大量迭代。 Many iterations will use past computed data as a starting point, and thus will read the corresponding and needed .pkl files.许多迭代将使用过去的计算数据作为起点，因此将读取相应且需要的 .pkl 文件。

Finally, I'm using several computers at the same time, each of them saving the computed data on the local network.最后，我同时使用多台计算机，每台计算机都将计算出的数据保存在本地网络上。 Thus each computer can access the data generated by the others and use it as a starting point.因此，每台计算机都可以访问其他计算机生成的数据并将其用作起点。

Back to the error:回到错误：

My main issue is that I never have this error when I start my programs by double-clicking the file or by the windows cmd or PowerShell.我的主要问题是，当我通过双击文件或 Windows cmd 或 PowerShell 启动我的程序时，我从来没有出现过这个错误。 The program never crashes throwing this error and runs without apparent issues.该程序永远不会因抛出此错误而崩溃并且运行时没有明显问题。

However, I can not read a .pkl file in spyder.但是，我无法在 spyder 中读取 .pkl 文件。 Every time I try, the error is thrown.每次我尝试时，都会抛出错误。

Any idea why I got this weird behavior?知道为什么我有这种奇怪的行为吗？

Thanks!谢谢！

Answer 1

When you dump stuff in a pickle you should avoid pickling classes and functions declared in the main module.当你在pickle转储东西时，你应该避免在主模块中声明的类和函数。 Your problem is (in part) because you only have one file in your program.您的问题（部分）是因为您的程序中只有一个文件。 pickle is lazy and does not serialize class definitions or function definitions. pickle是惰性的，不会序列化类定义或函数定义。 Instead it saves a reference of how to find the class (the module it lives in and its name).相反，它保存了如何找到类（它所在的模块及其名称）的参考。

When python runs a script/file directly it runs the program as the __main__ module (regardless of its actual file name).当 python 直接运行脚本/文件时，它将程序作为__main__模块运行（不管它的实际文件名如何）。 However, when a file is loaded and is not the main module (eg. when you do something like import program ) then its module name is based on its name.然而，当一个文件被加载并且不是主模块时（例如，当你执行诸如import program类的事情时），那么它的模块名称基于它的名称。 So program.py gets called program .所以program.py被称为program 。

When you are running from the command line you are doing the former, and the module is called __main__ .当您从命令行运行时，您正在执行前者，该模块称为__main__ 。 As such, pickle creates references to your classes like __main__.Signal .因此，pickle 创建对您的类的引用，例如__main__.Signal 。 When spyder tries to load the pickle file it gets told to import __main__ and look for Signal .当spyder尝试加载 pickle 文件时，它会被告知导入__main__并查找Signal 。 But, spyder's __main__ module is the module that is used to start spyder and not your program.py and so pickle fails to find Signal .但是，spyder 的__main__模块是用于启动spyder而不是您的program.py的模块，因此 pickle 无法找到Signal 。

You can inspect the contents of a pickle file by running ( -a is prints a description of each command).您可以通过运行（ -a打印每个命令的描述）来检查泡菜文件的内容。 From this you will see that your class is being referenced as __main__.Signal .从这里你会看到你的类被引用为__main__.Signal 。

python -m pickletools -a file.pkl

And you'll see something like:你会看到类似的东西：

    0: \x80 PROTO      3              Protocol version indicator.
    2: c    GLOBAL     '__main__ Signal' Push a global object (module.attr) on the stack.
   19: q    BINPUT     0                 Store the stack top into the memo.  The stack is not popped.
   21: )    EMPTY_TUPLE                  Push an empty tuple.
   22: \x81 NEWOBJ                       Build an object instance.
   23: q    BINPUT     1                 Store the stack top into the memo.  The stack is not popped.
   ...
   51: b    BUILD                        Finish building an object, via __setstate__ or dict update.
   52: .    STOP                         Stop the unpickling machine.
highest protocol among opcodes = 2

Solutions解决方案

There are a number of solutions available to you:有多种解决方案可供您使用：

Don't serialise instances of classes that are defined in your __main__ module.不要序列化在__main__模块中定义的类的实例。 The easiest and best solution.最简单和最好的解决方案。 Instead move these classes to another module, or write a main.py script to invoke your program (both will mean such classes are no longer found in the __main__ module).而是将这些类移动到另一个模块，或者编写一个main.py脚本来调用您的程序（两者都意味着在__main__模块中不再找到此类类）。
Write a custom derserialiser编写自定义反序列化器
Write a custom serialiser编写自定义序列化程序

The following solutions will be working with a pickle file called out.pkl created by the following code (in a file called program.py ):以下解决方案将使用由以下代码（在名为program.py的文件中）创建的名为out.pkl的泡菜文件：

import pickle

class MyClass:
    def __init__(self, name):
        self.name = name

if __name__ == '__main__':
    o = MyClass('test')
    with open('out.pkl', 'wb') as f:
        pickle.dump(o, f)

The Custom Deserialiser Solution自定义解串器解决方案

You can write a customer deserialiser that knows when it encounters a reference to the __main__ module what you really mean is the program module.您可以编写一个客户反序列化器，当它遇到对__main__模块的引用时，它知道您真正的意思是program模块。

import pickle

class MyCustomUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module == "__main__":
            module = "program"
        return super().find_class(module, name)

with open('out.pkl', 'rb') as f:
    unpickler = MyCustomUnpickler(f)
    obj = unpickler.load()

print(obj)
print(obj.name)

This is the easiest way to load pickle files that have already been created.这是加载已创建的pickle 文件的最简单方法。 The program is that it pushes the responsibility on to the deserialising code, when it should really be the responsibility of the serialising code to create pickle files correctly.该程序是将责任推给反序列化代码，而实际上序列化代码应该负责正确创建泡菜文件。

The Custom Serialisation Solution自定义序列化解决方案

In contrast to the previous solution you can make sure that serialised pickle objects can be deserialised easily by anyone without having to know the custom deserialisation logic.与之前的解决方案相比，您可以确保序列化的 pickle 对象可以被任何人轻松反序列化，而无需知道自定义反序列化逻辑。 To do this you can use the copyreg module to inform pickle how to deserialise various classes.为此，您可以使用copyreg模块通知pickle如何反序列化各种类。 So here, what you would do is tell pickle to deserialise all instances of __main__ classes as if they were instances of program classes.所以在这里，您要做的是告诉pickle反序列化__main__类的所有实例，就好像它们是program类的实例一样。 You will need to register a custom serialiser for each class您需要为每个类注册一个自定义序列化程序

import program
import pickle
import copyreg

class MyClass:
    def __init__(self, name):
        self.name = name

def pickle_MyClass(obj):
    assert type(obj) is MyClass
    return program.MyClass, (obj.name,)

copyreg.pickle(MyClass, pickle_MyClass)

if __name__ == '__main__':
    o = MyClass('test')
    with open('out.pkl', 'wb') as f:
        pickle.dump(o, f)

Answer 2

I think the dill module, which extends the python's pickle, could been a choice.我认为扩展 python 泡菜的dill模块可能是一个选择。 There need not the module path, likes __main__ .不需要模块路径， __main__ 。

Just use let the pickle replace with dill .只需使用 let the pickle替换为dill 。

import dill

# The functions
def write_file(data, folder_path, file_name):
    with open(join(folder_path, file_name), "wb") as output:
        dill.dump(data, output)

def read_file(folder_path, file_name):
    with open(join(folder_path, file_name), "rb") as input:
        data= dill.load(input)
    return data

读取pickle文件时出现属性错误

问题描述

2 个解决方案

解决方案1
27 已采纳 2018-05-22 16:48:25

Solutions解决方案

The Custom Deserialiser Solution自定义解串器解决方案

The Custom Serialisation Solution自定义序列化解决方案

解决方案2
2 2021-03-25 03:47:36

读取pickle文件时出现属性错误

问题描述

2 个解决方案

解决方案1 27 已采纳 2018-05-22 16:48:25

Solutions解决方案

The Custom Deserialiser Solution自定义解串器解决方案

The Custom Serialisation Solution自定义序列化解决方案

解决方案2 2 2021-03-25 03:47:36

解决方案1
27 已采纳 2018-05-22 16:48:25

解决方案2
2 2021-03-25 03:47:36