I have couple of huge training files I am planning to train. The validation data is also perfect and I see no problem but the SIZE is huge. I am talking about 20GB+. Loading one file crashes python due to Memory error
I have tried making the file to one but it's too big
X = np.load('X150.npy')
Y = np.load('Y150.npy')
Error
~\AppData\Roaming\Python\Python37\site-packages\numpy\lib\format.py in read_array(fp, allow_pickle, pickle_kwargs)
710 if isfileobj(fp):
711 # We can use the fast fromfile() function.
--> 712 array = numpy.fromfile(fp, dtype=dtype, count=count)
713 else:
714 # This is not a real file. We have to read it the
MemoryError:
I need a solution so I can train huge datasets.
Important: First make sure that your python is 64bit. The methods below only support files upto 2GB for 32bit python versions
Typically, one should use np.memmap()
to use the array without loading on to the RAM. From the numpy docs , "Memory-mapped files are used for accessing small segments of large files on disk, without reading the entire file into memory."
Example usage:
x_file = "X_150.npy"
X = np.memmap(x_file, dtype='int', mode='w+', shape=(300000, 1000))
However, since your files as already stored as .npy files, I stumbled upon np.lib.format.open_memmap()
which creates or loads memory mapped .npy files.
The usage would be as follows, identical to what you'd do with np.memmap():
x_file = "X_150.npy"
X = np.lib.format.open_memmap(x_file, dtype='int', mode='w+', shape=(300000, 1000))
Here's the docs for the second function (from this answer ):
>>> print numpy.lib.format.open_memmap.__doc__
"""
Open a .npy file as a memory-mapped array.
This may be used to read an existing file or create a new one.
Parameters
----------
filename : str
The name of the file on disk. This may not be a filelike object.
mode : str, optional
The mode to open the file with. In addition to the standard file modes,
'c' is also accepted to mean "copy on write". See `numpy.memmap` for
the available mode strings.
dtype : dtype, optional
The data type of the array if we are creating a new file in "write"
mode.
shape : tuple of int, optional
The shape of the array if we are creating a new file in "write"
mode.
fortran_order : bool, optional
Whether the array should be Fortran-contiguous (True) or
C-contiguous (False) if we are creating a new file in "write" mode.
version : tuple of int (major, minor)
If the mode is a "write" mode, then this is the version of the file
format used to create the file.
Returns
-------
marray : numpy.memmap
The memory-mapped array.
Raises
------
ValueError
If the data or the mode is invalid.
IOError
If the file is not found or cannot be opened correctly.
See Also
--------
numpy.memmap
"""
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.