简体   繁体   English

如何使用python3 pickle通过python2 cPikle读取序列化数据?

[英]How to read serialized data by python2 cPikle with python3 pickle?

I'm trying to work with CIFAR-10 dataset which contains a special version for python . 我正在尝试使用CIFAR-10数据集 ,其中包含python的特殊版本

It is a set of binary files, each representing a dictionary of 10k numpy matrices. 它是一组二进制文件,每个代表10k numpy矩阵的字典。 The files were obviously created by python2 cPickle . 这些文件显然是由python2 cPickle创建的。

I tried to load it from python2 as follows: 我试图从python2加载它,如下所示:

import cPickle
with open("data/data_batch_1", "rb") as f:
    data = cPickle.load(f)

This works really great. 这真的很棒。 However, if I try to load the data from python3 (that hasn't cPickle but pickle instead), it fails: 但是,如果我尝试从python3(不是cPickle而是pickle )中加载数据,它将失败:

import pickle
with open("data/data_batch_1", "rb") as f:
    data = pickle.load(f)

If fails with the following error: 如果失败并出现以下错误:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)

Can I somehow transform the ofiginal dataset into new one that will be readable from python3? 我可以以某种方式将原始数据集转换为可以从python3读取的新数据集吗? Or may I somehow read it from python3 direrctly? 或者我可以以某种方式直接从python3中读取它吗?

I've tried loading it by cPickle , dumping it into json and reading it back by pickle , but numpy matrices obviously can't be written as a json file. 我试过通过cPickle加载它,将其转储到json ,然后通过pickle读回,但是numpy矩阵显然不能写为json文件。

You'll need to tell pickle what codec to use for those bytestrings, or tell it to load the data as bytes instead. 您需要告诉pickle这些字节串要使用哪种编解码器,或者告诉它将数据加载为bytes From the pickle.load() documentation : pickle.load()文档中

The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; 编码错误告诉pickle如何解码Python 2腌制的8位字符串实例。 these default to 'ASCII' and 'strict', respectively. 它们分别默认为“ ASCII”和“ strict”。 The encoding can be 'bytes' to read these 8-bit string instances as bytes objects. 编码可以是“字节”,以将这些8位字符串实例读取为字节对象。

To load the strings as bytes objects that'd be: 要将字符串作为bytes对象加载:

import pickle
with open("data/data_batch_1", "rb") as f:
    data = pickle.load(f, encoding='bytes')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM