简体   繁体   中英

Decode pickle file in human-readable format in python

import pickle

data_pkl = open("data.pkl", "rb")
d_c = data_pkl.read()
data_pkl.close()
print(d_c)

When a data is dumped, pickle produce a bytes string. This is what you have.

For instance:

import pickle

data = {'text': 'value', 'list': [1, 2, 3]}

s = pickle.dumps(data)
print(s)

Produces the bytes string:

b'\x80\x03}q\x00(X\x04\x00\x00\x00textq\x01X\x05\x00\x00'
b'\x00valueq\x02X\x04\x00\x00\x00listq\x03]q\x04(K\x01K'
b'\x02K\x03eu.'

note : I split the long line in 3 parts for readability.

Python defines several protocols, names HIGHEST_PROTOCOL and DEFAULT_PROTOCOL . So, If you change the protocol you can have a different result.

To read this bytes string, you need to use pickle.load (or pickle.loads to read from a bytes string).

For instance:

import pprint

obj = pickle.loads(s)
pprint.pprint(obj)

You get:

{'list': [1, 2, 3], 'text': 'value'}

Cool, but if your data contains instance of unknown type, you won't be able to deserialize it.

Here is an example:

import pickle
import pprint


class UnknownClass:
    def __init__(self, value):
        self.value = value


data = {'text': 'value',
        'list': [1, 2, 3],
        'u': UnknownClass(25)}

s = pickle.dumps(data)
print(s)

del UnknownClass

obj = pickle.loads(s)

The del statement is here to simulate an unknown type.

The result will be:

Traceback (most recent call last):
  File "/path/to/stack.py", line 19, in <module>
    obj = pickle.loads(s)
AttributeError: Can't get attribute 'UnknownClass' on <module '__main__' from '/path/to/stack.py'>

For more info, the protocols are specified in the Python documentation.

I would recommend looking at the Python documentation, in particular the pickle module docs . Your current code is importing pickle , but it's not actually using pickle , since you're just loading the file using read() . Using pickle.load() or another pickle method should do the trick.

For example:

d_c = pickle.load(data_pkl)

Editing to add the mandatory pickle warning from the docs:

Warning: The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

(Unpickling an unknown file leaves you open to having arbitrary code executed on your computer, so be careful what you unpickle!)

When google brought me to this question, the answer that I would have liked to have seen was to import pickletools<\/code> and then use pickletools.dis(s)<\/code> to explain what the various characters between the understandable substrings within pickle s<\/code> were indicating. This is only marginally human-readable, since it reads more like machine assembly language than python, but it still helps a human reader to peer behind the curtain and make some sense of the gobbledygook.

When that's what you want, pickle.load<\/code> or pickle.loads<\/code> are the way to go. Or if, for some reason, you want to serialize your data in a format that is both human-readable and machine-readable, you probably want some other serializer, like JSON, or you could set pickle to encode with the original pickle protocol 0, which was human-readable (but less efficient).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM