简体   繁体   English

"在python中以人类可读格式解码pickle文件"

[英]Decode pickle file in human-readable format in python

import pickle

data_pkl = open("data.pkl", "rb")
d_c = data_pkl.read()
data_pkl.close()
print(d_c)

When a data is dumped, pickle produce a bytes string. 转储数据时, pickle会生成一个字节字符串。 This is what you have. 这就是你所拥有的。

For instance: 例如:

import pickle

data = {'text': 'value', 'list': [1, 2, 3]}

s = pickle.dumps(data)
print(s)

Produces the bytes string: 产生字节字符串:

b'\x80\x03}q\x00(X\x04\x00\x00\x00textq\x01X\x05\x00\x00'
b'\x00valueq\x02X\x04\x00\x00\x00listq\x03]q\x04(K\x01K'
b'\x02K\x03eu.'

note : I split the long line in 3 parts for readability. 注意 :为了便于阅读,我将长行分为三部分。

Python defines several protocols, names HIGHEST_PROTOCOL and DEFAULT_PROTOCOL . Python定义了几种协议,分别命名为HIGHEST_PROTOCOLDEFAULT_PROTOCOL So, If you change the protocol you can have a different result. 因此,如果更改协议,则可能会有不同的结果。

To read this bytes string, you need to use pickle.load (or pickle.loads to read from a bytes string). 要读取此字节字符串,您需要使用pickle.load (或pickle.loads从字节字符串读取)。

For instance: 例如:

import pprint

obj = pickle.loads(s)
pprint.pprint(obj)

You get: 你得到:

{'list': [1, 2, 3], 'text': 'value'}

Cool, but if your data contains instance of unknown type, you won't be able to deserialize it. 很酷,但是如果您的数据包含未知类型的实例,则将无法对其进行反序列化。

Here is an example: 这是一个例子:

import pickle
import pprint


class UnknownClass:
    def __init__(self, value):
        self.value = value


data = {'text': 'value',
        'list': [1, 2, 3],
        'u': UnknownClass(25)}

s = pickle.dumps(data)
print(s)

del UnknownClass

obj = pickle.loads(s)

The del statement is here to simulate an unknown type. 这里的del语句用于模拟未知类型。

The result will be: 结果将是:

Traceback (most recent call last):
  File "/path/to/stack.py", line 19, in <module>
    obj = pickle.loads(s)
AttributeError: Can't get attribute 'UnknownClass' on <module '__main__' from '/path/to/stack.py'>

For more info, the protocols are specified in the Python documentation. 有关更多信息,请在Python文档中指定协议。

I would recommend looking at the Python documentation, in particular the pickle module docs . 我建议您查看Python文档,尤其是pickle模块docs Your current code is importing pickle , but it's not actually using pickle , since you're just loading the file using read() . 您当前的代码正在导入pickle ,但实际上并没有使用pickle ,因为您只是使用read()加载文件。 Using pickle.load() or another pickle method should do the trick. 使用pickle.load()或其他pickle方法应该可以解决问题。

For example: 例如:

d_c = pickle.load(data_pkl)

Editing to add the mandatory pickle warning from the docs: 编辑以添加来自文档的强制性腌制警告:

Warning: The pickle module is not secure against erroneous or maliciously constructed data. 警告:泡菜模块无法防止错误或恶意构建的数据。 Never unpickle data received from an untrusted or unauthenticated source. 切勿挑剔从不可信或未经身份验证的来源收到的数据。

(Unpickling an unknown file leaves you open to having arbitrary code executed on your computer, so be careful what you unpickle!) (解开未知文件会使您容易在计算机上执行任意代码,因此请小心操作!)

When google brought me to this question, the answer that I would have liked to have seen was to import pickletools<\/code> and then use pickletools.dis(s)<\/code> to explain what the various characters between the understandable substrings within pickle s<\/code> were indicating.当 google 将我带到这个问题时,我希望看到的答案是import pickletools<\/code> ,然后使用pickletools.dis(s)<\/code>来解释 pickle 中可理解s<\/code>子字符串之间的各种字符表示什么。 This is only marginally human-readable, since it reads more like machine assembly language than python, but it still helps a human reader to peer behind the curtain and make some sense of the gobbledygook.这只是勉强可读,因为它读起来更像机器汇编语言而不是 python,但它仍然可以帮助人类读者窥视幕后并理解 gobbledygook。

Of course, what we usually want isn't for humans<\/em> to read serialized data, but for computers<\/em> to read it and make good use of it.当然,我们通常想要的不是让人类<\/em>读取序列化数据,而是让计算机<\/em>读取并利用好它。 When that's what you want, pickle.load<\/code> or pickle.loads<\/code> are the way to go.如果这就是你想要的, pickle.load<\/code>或pickle.loads<\/code>就是要走的路。 Or if, for some reason, you want to serialize your data in a format that is both human-readable and machine-readable, you probably want some other serializer, like JSON, or you could set pickle to encode with the original pickle protocol 0, which was human-readable (but less efficient).或者,如果出于某种原因,您想以人类可读和机器可读的格式序列化数据,您可能需要其他序列化程序,例如 JSON,或者您可以设置 pickle 以使用原始 pickle 协议 0 进行编码,这是人类可读的(但效率较低)。

"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM