简体   繁体   中英

Numpy load a memory-mapped array (mmap_mode) from google cloud storage

I want to load a .npy from google storage (gs://project/file.npy) into my google ml-job as training data. Since the file is +10GB big, I want to use the mmap_mode option of numpy.load() to not run out of memory.

Background: I use Keras with fit_generator and Keras Sequence to load batches of data from the .npy that is stored on google storage.

To access google storage I'm using BytesIO since not every lib can access google storage. This code works fine without mmap_mode = 'r':

from tensorflow.python.lib.io import file_io
from io import BytesIO

filename = 'gs://project/file'

x_file = BytesIO(file_io.read_file_to_string(filename + '.npy', binary_mode = True))
x = np.load(x_file)

If I activate mmap_mode, I get this error:

TypeError: expected str, bytes or os.PathLike object, not BytesIO

I don't understand why it now doesn't accept the BytesIO anymore.

Code including mmap_mode:

x_file = BytesIO(file_io.read_file_to_string(filename + '.npy', binary_mode = True))
x = np.load(x_file, mmap_mode = 'r')

Trace:

File "[...]/numpy/lib/npyio.py", line 444, in load return format.open_memmap(file, mode=mmap_mode) File "[...]/numpy/lib/format.py", line 829, in open_memmap fp = open(os_fspath(filename), 'rb') File "[...]/numpy/compat/py3k.py", line 237, in os_fspath "not " + path_type. name ) TypeError: expected str, bytes or os.PathLike object, not BytesIO

You can pass from BytesIO to bytes using b.getvalue()

x_file = BytesIO(file_io.read_file_to_string(filename + '.npy', binary_mode = True))
x = np.load(x_file.getvalue(), mmap_mode = 'r')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM