Unable to load a previously dumped pickle file of large size in Python

Question

I used cPickle and protocol version 2 to dump some computation results. The code looks like this:

> f = open('foo.pck', 'w')
> cPickle.dump(var, f, protocol=2)
> f.close()

The variable var is a tuple of length two. The type of var[0] is a list and var[1] is a numpy.ndarray.

The above code segment successfully generated a file with large size (~1.7G).

However, when I tried to load the variable from foo.pck, I got the following error.

ValueError                                Traceback (most recent call last)
/home/user_account/tmp/<ipython-input-3-fd3ecce18dcd> in <module>()
----> 1 v = cPickle.load(f)
ValueError: buffer size does not match array size

The loading codes looks like the following.

> f= open('foo.pck', 'r')
> v = cPickle.load(f)

I also tried to use pickle (instead of cPickle) to load the variable, but got a similar error msg as follows.

ValueError                                Traceback (most recent call last)
/home/user_account/tmp/<ipython-input-3-aa6586c8e4bf> in <module>()
----> 1 v = pickle.load(f)

/usr/lib64/python2.6/pickle.pyc in load(file)
   1368 
   1369 def load(file):
-> 1370     return Unpickler(file).load()
   1371 
   1372 def loads(str):

/usr/lib64/python2.6/pickle.pyc in load(self)
    856             while 1:
    857                 key = read(1)
--> 858                 dispatch[key](self)
    859         except _Stop, stopinst:
    860             return stopinst.value

/usr/lib64/python2.6/pickle.pyc in load_build(self)
   1215         setstate = getattr(inst, "__setstate__", None)
   1216         if setstate:
-> 1217             setstate(state)
   1218             return
   1219         slotstate = None

ValueError: buffer size does not match array size

I tried the same code segments to a much smaller size data and it worked fine. So my best guess is that I reached the loading size limitation of pickle (or cPickle). However, it is strange to dump successfully (with large size variable) but failed to load.

If this is indeed a loading size limitation problem, how should I bypass it? If not, what can be the possible cause of the problem?

Any suggestion is appreciated. Thanks!

Answer 1

How about save & load the numpy array by numpy.save() & np.load() ?

You can save the pickled list and the numpy array to the same file:

import numpy as np
import cPickle
data = np.random.rand(50000000)
f = open('foo.pck', 'wb')
cPickle.dump([1,2,3], f, protocol=2)
np.save(f, data)
f.close()

to read the data:

import cPickle
import numpy as np
f= open('foo.pck', 'rb')
v = cPickle.load(f)
data = np.load(f)
print data.shape, data

Unable to load a previously dumped pickle file of large size in Python

Question

1 answers

solution1
3 ACCPTED 2012-08-22 06:37:45

Unable to load a previously dumped pickle file of large size in Python

Question

1 answers

solution1 3 ACCPTED 2012-08-22 06:37:45

solution1
3 ACCPTED 2012-08-22 06:37:45