简体   繁体   中英

Saving many arrays of different lengths

I have ~8000 arrays of two-dimensional points, stored in memory as a Python list of numpy arrays. Each array has shape (x,2) , where x is a number between ~600 and ~4000. Essentially, I have a jagged 3-d array.

I want to store this data in a convenient/fast format for reading/writing from disk. I'd rather not create ~8000 separate files, but I'd also rather not pad out a full (8000,4000,2) matrix with zeros if I can avoid it.

How should I store my data on disk, such that both filesize and parsing/serialization are minimized?

There's a standard called HDF for storing large number data sets. You can find some information in the following link but in general terms, HDF defines a binary file format that can be used for large information storing.

You can find a example here that stores large Numpy arrays on disk. In that post, the writer makes a comparison between Python Pickle and HDF5.

I also recommend you this introduction to HDF5. Here's th h5py package, that is a Pythonic interface to the HDF5 binary data format.

Put all your numpy arrays into a single python list and then pickle , or cPickle , that list.

For example:

import cPickle
from numpy import array, ones
a = array((5,2))
b = ones((10,2))
c = array((20,2))
all = [a,b,c]
cPickle.dump(all, open('all_my_arrays', 'w'))

You can then retrieve them with:

all2 = cPickle.load(open('all_my_arrays'))

Note that the list all does not require any massive new memory allocation. Because all is just a list of pointers to your numpy arrays, nothing has to be padded with zeros or otherwise copied.

Relative to pickle, HDF5 as the advantages of speed on large arrays and cross-application support (octave, perl, etc.). On the other hand, pickle has the advantages of not requiring any extra software installation (it is included with python) and it also natively understands python objects.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM