I use a custom datatype, eg datatype = np.dtype('({:n},{:n})f4'.format(10000,100000))
to read data from a binary file using
np.fromfile(filename, dtype=datatype)
However, defining the datatype using np.dtype gives an error for large datasets, as in the example datatype above:
ValueError: invalid shape in fixed-type tuple: dtype size in bytes must fit into a C int
Initializing an array of that size is no problem: a=np.zeros((10000,100000))
. So my question is: Where does that limitation come from and how can I get around it? I can of course use a loop and read chunks at a time, but maybe there is a more elegant way?
When you specify a dtype of '(M, N)f4'
you are effectively specifying the final two dimensions of the output array, eg
np.zeros(5, np.dtype('(6, 7)f4')).shape
# (5, 6, 7)
You could achieve the same outcome by simply reading in the data as a 1D array, then reshaping it to your desired shape:
x = np.fromfile(filename, np.float32).reshape(-1, 10000, 100000)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.