简体   繁体   中英

Pandas backwards compatibility issue with pickle 0.14.1 and 0.15.2

We're using pandas Dataframe as our primary data container for our time series data. We pack the dataframe into binary blobs into a mongoDB document for storage along with keys for meta data about the time series blob.

We ran into an error when we upgraded from pandas 0.14.1 to 0.15.2.

Create binary blob of pandas Dataframe (0.14.1)

import lz4   
import cPickle

bd = lz4.compress(cPickle.dumps(df,cPickle.HIGHEST_PROTOCOL))

Error Case : Read back in from mongoDB with pandas 0.15.2

cPickle.loads(lz4.decompress(bd))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-76f7b0b41426> in <module>()
----> 1 cPickle.loads(lz4.decompress(bd))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function _reconstruct>, (<class 'pandas.core.index.Index'>, (0,), 'b'))

Success Case : Read back in from mongoDB with pandas 0.14.1 with no error.

This seems to be similar to an old stack thread Pandas compiled from source: default pickle behavior changed With a helpful comment from https://stackoverflow.com/users/644898/jeff

The error message you are seeing `TypeError: _reconstruct: First argument must be a sub-type of ndarray is that the python default unpickler makes sure that the class hierarchy that was pickled is exactly the same what it is recreating. Since Series has changed between versions this is no longer possible with the default unpickler, (this IMHO is a bug in the way pickle works). In any event, pandas will unpickle pre-0.13 pickles that have Series objects."

Any ideas on workaround or solutions?

To recreate error:

Setup in pandas 0.14.1 env:

df = pd.DataFrame(np.random.randn(10,10))
cPickle.dump(df,open("cp0141.p","wb"))
cPickle.load(open('cp0141.p','r')) # no error

Create error in pandas 0.15.2 env:

cPickle.load(open('cp0141.p','r'))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function_reconstruct>, (<class 'pandas.core.index.Int64Index'>, (0,), 'b'))

This was explicity mentioned as the Index class now no-longer sub-classes ndarray but a pandas object, see here .

You simply need to use pd.read_pickle to read the pickles.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM