简体   繁体   English

将pickle py2移植到py3字符串变为字节

[英]Porting pickle py2 to py3 strings become bytes

I have a pickle file that was created with python 2.7 that I'm trying to port to python 3.6. 我有一个用python 2.7创建的pickle文件,我正在尝试移植到python 3.6。 The file is saved in py 2.7 via pickle.dumps(self.saved_objects, -1) 该文件通过pickle.dumps(self.saved_objects, -1)保存在py 2.7中pickle.dumps(self.saved_objects, -1)

and loaded in python 3.6 via loads(data, encoding="bytes") (from a file opened in rb mode). 并通过loads(data, encoding="bytes") (来自以rb模式打开的文件loads(data, encoding="bytes")加载到python 3.6中。 If I try opening in r mode and pass encoding=latin1 to loads I get UnicodeDecode errors. 如果我尝试在r模式下打开并将encoding=latin1传递给loads我会收到UnicodeDecode错误。 When I open it as a byte stream it loads, but literally every string is now a byte string. 当我打开它作为字节流时,它会加载,但字面上每个字符串现在都是一个字节字符串。 Every object's __dict__ keys are all b"a_variable_name" which then generates attribute errors when calling an_object.a_variable_name because __getattr__ passes a string and __dict__ only contains bytes. 每个对象的__dict__键都是b"a_variable_name" ,然后在调用an_object.a_variable_name时生成属性错误,因为__getattr__传递一个字符串而__dict__只包含字节。 I feel like I've tried every combination of arguments and pickle protocols already. 我觉得我已经尝试了各种参数和pickle协议的组合。 Apart from forcibly converting all objects' __dict__ keys to strings I'm at a loss. 除了将所有对象的__dict__键强制转换为字符串之外我还不知所措。 Any ideas? 有任何想法吗?

** Skip to 4/28/17 update for better example ** 跳到4/28/17更新以获得更好的示例

------------------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ---------

** Update 4/27/17 ** 更新4/27/17

This minimum example illustrates my problem: 这最小的例子说明了我的问题:

From py 2.7.13 从py 2.7.13

import pickle

class test(object):
    def __init__(self):
        self.x = u"test ¢" # including a unicode str breaks things

t = test()
dumpstr = pickle.dumps(t)

>>> dumpstr
"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb."

From py 3.6.1 从py 3.6.1

import pickle

class test(object):
    def __init__(self):
        self.x = "xyz"

dumpstr = b"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb."

t = pickle.loads(dumpstr, encoding="bytes")

>>> t
<__main__.test object at 0x040E3DF0>
>>> t.x
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    t.x
AttributeError: 'test' object has no attribute 'x'
>>> t.__dict__
{b'x': 'test ¢'} 
>>> 

------------------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ---------

Update 4/28/17 更新4/28/17

To re-create my issue I'm posting my actual raw pickle data here 要重新创建我的问题,我在这里发布我的实际原始泡菜数据

The pickle file was created in python 2.7.13, windows 10 using pickle文件是在python 2.7.13中创建的,windows 10使用的

with open("raw_data.pkl", "wb") as fileobj:
    pickle.dump(library, fileobj, protocol=0)

(protocol 0 so it's human readable) (协议0所以它是人类可读的)

To run it you'll need classes.py 要运行它,您需要classes.py

# classes.py

class Library(object): pass


class Book(object): pass


class Student(object): pass


class RentalDetails(object): pass

And the test script here: 这里的测试脚本:

# load_pickle.py
import pickle, sys, itertools, os

raw_pkl = "raw_data.pkl"
is_py3 = sys.version_info.major == 3

read_modes = ["rb"]
encodings = ["bytes", "utf-8", "latin-1"]
fix_imports_choices = [True, False]
files = ["raw_data_%s.pkl" % x for x in range(3)]


def py2_test():
    with open(raw_pkl, "rb") as fileobj:
        loaded_object = pickle.load(fileobj)
        print("library dict: %s" % (loaded_object.__dict__.keys()))
        return loaded_object


def py2_dumps():
    library = py2_test()
    for protcol, path in enumerate(files):
        print("dumping library to %s, protocol=%s" % (path, protcol))
        with open(path, "wb") as writeobj:
            pickle.dump(library, writeobj, protocol=protcol)


def py3_test():
    # this test iterates over the different options trying to load
    # the data pickled with py2 into a py3 environment
    print("starting py3 test")
    for (read_mode, encoding, fix_import, path) in itertools.product(read_modes, encodings, fix_imports_choices, files):
        py3_load(path, read_mode=read_mode, fix_imports=fix_import, encoding=encoding)


def py3_load(path, read_mode, fix_imports, encoding):
    from traceback import print_exc
    print("-" * 50)
    print("path=%s, read_mode = %s fix_imports = %s, encoding = %s" % (path, read_mode, fix_imports, encoding))
    if not os.path.exists(path):
        print("start this file with py2 first")
        return
    try:
        with open(path, read_mode) as fileobj:
            loaded_object = pickle.load(fileobj, fix_imports=fix_imports, encoding=encoding)
            # print the object's __dict__
            print("library dict: %s" % (loaded_object.__dict__.keys()))
            # consider the test a failure if any member attributes are saved as bytes
            test_passed = not any((isinstance(k, bytes) for k in loaded_object.__dict__.keys()))
            print("Test %s" % ("Passed!" if test_passed else "Failed"))
    except Exception:
        print_exc()
        print("Test Failed")
    input("Press Enter to continue...")
    print("-" * 50)


if is_py3:
    py3_test()
else:
    # py2_test()
    py2_dumps()

put all 3 in the same directory and run c:\\python27\\python load_pickle.py first which will create 1 pickle file for each of the 3 protocols. 将所有3放在同一目录中并首先运行c:\\python27\\python load_pickle.py ,这将为3个协议中的每一个创建1个pickle文件。 Then run the same command with python 3 and notice that it version converts the __dict__ keys to bytes. 然后使用python 3运行相同的命令,并注意它的版本将__dict__键转换为字节。 I had it working for about 6 hours, but for the life of me I can't figure out how I broke it again. 我让它工作了大约6个小时,但对于我的生活,我无法弄清楚我是如何再次打破它的。

In short, you're hitting bug 22005 with datetime.date objects in the RentalDetails objects. 简而言之,您正在使用RentalDetails对象中的datetime.date对象访问错误22005

That can be worked around with the encoding='bytes' parameter, but that leaves your classes with __dict__ containing bytes: 这可以使用encoding='bytes'参数解决,但是这会使您的类包含__dict__包含的字节:

>>> library = pickle.loads(pickle_data, encoding='bytes')
>>> dir(library)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'bytes'

It's possible to manually fix that based on your specific data: 可以根据您的特定数据手动修复它:

def fix_object(obj):
    """Decode obj.__dict__ containing bytes keys"""
    obj.__dict__ = dict((k.decode("ascii"), v) for k, v in obj.__dict__.items())


def fix_library(library):
    """Walk all library objects and decode __dict__ keys"""
    fix_object(library)
    for student in library.students:
            fix_object(student)
    for book in library.books:
            fix_object(book)
            for rental in book.rentals:
                    fix_object(rental)

But that's fragile and enough of a pain you should be looking for a better option. 但这很脆弱,你应该寻找更好的选择。

1) Implement __getstate__ / __setstate__ that maps datetime objects to a non-broken representation, for instance: 1)实现将datetime对象映射到非破坏表示的__getstate__ / __setstate__ ,例如:

class Event(object):
    """Example class working around datetime pickling bug"""

    def __init__(self):
            self.date = datetime.date.today()

    def __getstate__(self):
            state = self.__dict__.copy()
            state["date"] = state["date"].toordinal()
            return state

    def __setstate__(self, state):
            self.__dict__.update(state)
            self.date = datetime.date.fromordinal(self.date)

2) Don't use pickle at all. 2)根本不要使用泡菜。 Along the lines of __getstate__ / __setstate__ , you can just implement to_dict / from_dict methods or similar in your classes for saving their content as json or some other plain format. 沿着__getstate__ / __setstate__的行,您可以在类中实现to_dict / from_dict方法或类似方法,以将其内容保存为json或其他纯文本格式。

A final note, having a backreference to library in each object shouldn't be required. 最后一点,不应要求在每个对象中对库进行反向引用。

Question : Porting pickle py2 to py3 strings become bytes 问题 :将pickle py2移植到py3字符串变为字节

The given encoding='latin-1' below, is ok. 下面给出的encoding='latin-1'是可以的。
Your Problem with b'' are the result of using encoding='bytes' . b''问题是使用encoding='bytes' This will result in dict-keys being unpickled as bytes instead of as str. 这将导致dict-keys被取消作为字节而不是str。

The Problem data are the datetime.date values '\\x07á\\x02\\x10' , starting at line 56 in raw-data.pkl . 问题数据是datetime.date values '\\x07á\\x02\\x10' ,从raw-data.pkl56行开始。

It's a konwn Issue, as pointed already. 正如已经指出的那样,这是一个konwn问题。
Unpickling python2 datetime under python3 在python3下取消python2 datetime
http://bugs.python.org/issue22005 http://bugs.python.org/issue22005

For a workaround, I have patched pickle.py and got unpickled object , eg 对于一个变通方法,我修补了pickle.py并获得了未pickle.py unpickled object ,例如

book.library.books[0].rentals[0].rental_date=2017-02-16 book.library.books [0] .rentals [0] = .rental_date 2017年2月16日


This will work for me: 这对我有用:

t = pickle.loads(dumpstr, encoding="latin-1")

Output : 输出
< main .test object at 0xf7095fec> < main .test对象位于0xf7095fec>
t.__dict__={'x': 'test ¢'} t .__ dict __ = {'x':'test¢'}
test ¢ 测试¢

Tested with Python:3.4.2 用Python测试:3.4.2

You should treat pickle data as specific to the (major) version of Python that created it. 您应该将pickle数据视为特定于创建它的Python(主要)版本。

(See Gregory Smith's message wrt issue 22005 .) (见格雷戈里史密斯的第22005号问题 。)

The best way to get around this is to write a Python 2.7 program to read the pickled data, and write it out in a neutral format. 解决这个问题的最好方法是编写一个Python 2.7程序来读取pickle数据,然后以中性格式写出来。

Taking a quick look at your actual data, it seems to me that an SQLite database is appropriate as an interchange format, since the Book s contain references to a Library and RentalDetails . 快速浏览一下您的实际数据,在我看来,SQLite数据库适合作为交换格式,因为Book包含对LibraryRentalDetails You could create separate tables for each. 您可以为每个表创建单独的表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM