Numpy load a memory-mapped array (mmap_mode) from google cloud storage

Question

I want to load a .npy from google storage (gs://project/file.npy) into my google ml-job as training data. Since the file is +10GB big, I want to use the mmap_mode option of numpy.load() to not run out of memory.

Background: I use Keras with fit_generator and Keras Sequence to load batches of data from the .npy that is stored on google storage.

To access google storage I'm using BytesIO since not every lib can access google storage. This code works fine without mmap_mode = 'r':

from tensorflow.python.lib.io import file_io
from io import BytesIO

filename = 'gs://project/file'

x_file = BytesIO(file_io.read_file_to_string(filename + '.npy', binary_mode = True))
x = np.load(x_file)

If I activate mmap_mode, I get this error:

TypeError: expected str, bytes or os.PathLike object, not BytesIO

I don't understand why it now doesn't accept the BytesIO anymore.

Code including mmap_mode:

x_file = BytesIO(file_io.read_file_to_string(filename + '.npy', binary_mode = True))
x = np.load(x_file, mmap_mode = 'r')

Trace:

File "[...]/numpy/lib/npyio.py", line 444, in load return format.open_memmap(file, mode=mmap_mode) File "[...]/numpy/lib/format.py", line 829, in open_memmap fp = open(os_fspath(filename), 'rb') File "[...]/numpy/compat/py3k.py", line 237, in os_fspath "not " + path_type. name ) TypeError: expected str, bytes or os.PathLike object, not BytesIO

Answer 1

You can pass from BytesIO to bytes using b.getvalue()

x_file = BytesIO(file_io.read_file_to_string(filename + '.npy', binary_mode = True))
x = np.load(x_file.getvalue(), mmap_mode = 'r')

Numpy load a memory-mapped array (mmap_mode) from google cloud storage

Question

1 answers

solution1
0 2020-01-06 09:48:09

Numpy load a memory-mapped array (mmap_mode) from google cloud storage

Question

1 answers

solution1 0 2020-01-06 09:48:09

solution1
0 2020-01-06 09:48:09