简体   繁体   中英

Numpy load part of *.npz file in mmap_mode

I know there already exists a similar question, which has not been answered.

I have a very large numpy array saved in a npz file. I don't want it to be loaded completely (my RAM can't handle it entirely), but just want to load a part of it.

This is how the file was generated:

np.savez_compressed('file_name.npz', xxx)

And this is how I would like to load it:

xxx = np.load('file_name.npz,mmap_mode="r")

Now, to actually access the part of the array I am interested into, I should type

a = xxx['arr_0'][0][0][0]

But though this piece is quite small, python first loads the whole array (I know it because my RAM is filled) and then shows this small part. The same would happen if I directly wrote

xxx = np.load('file_name.npz,mmap_mode="r")['arr_0'][0][0][0]

What am I doing wrong?

mmap_mode does not work with a npz file. An npz is a zip archive. That is, it contains npy files, one per key . You can see this by looking at the npz file with a OS archive manager tool.

I'm a little surprised that your load call doesn't raise an error, but looking at the code I see that it dispatches to NpzFile loader without even looking at the mmap_mode parameter.

To use mmap , you'll have to extract arr_0.npy (again using the OS tool), and use load on it.

Memory-mapping only works with arrays stored in a binary file on disk (see documentation ), not with compressed archives like .npz .

So when you perform xxx = np.load('file_name.npz', mmap_mode='r') , you load the NpzFile object with the following attributes:

xxx.__dict__
>>> {'_files': ['arr_0.npy'],
     'files': ['arr_0'],
     'allow_pickle': False,
     'pickle_kwargs': {...},
     'zip': <zipfile.ZipFile file=<_io.BufferedReader name='/path/to/xxx_npz/file/file_name.npz'> mode='r'>,
     'f': <numpy.lib.npyio.BagObj at ...>,
     'fid': <_io.BufferedReader name='/path/to/xxx_npz/file/file_name.npz'>}

And when you do xxx['arr_0'] , it loads the corresponding attribute of your NpzFile object – which in your case is the full numpy array!

Instead, you can extract the .npz file using:

from zipfile import ZipFile

with ZipFile('path/to/file_name.npz', 'r') as f:
    f.extractall(path='path/where/you/want/to/extract', members=['arr_0.npy'])

And then execute:

xxx = np.load('path/to/arr_0.npy', mmap_mode='r')

See here for a useful resource on using memory maps in numpy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM