numpy dtype ValueError: invalid shape in fixed-type tuple - how can I get around it?

Question

I use a custom datatype, eg datatype = np.dtype('({:n},{:n})f4'.format(10000,100000)) to read data from a binary file using

np.fromfile(filename, dtype=datatype)

However, defining the datatype using np.dtype gives an error for large datasets, as in the example datatype above:

ValueError: invalid shape in fixed-type tuple: dtype size in bytes must fit into a C int

Initializing an array of that size is no problem: a=np.zeros((10000,100000)) . So my question is: Where does that limitation come from and how can I get around it? I can of course use a loop and read chunks at a time, but maybe there is a more elegant way?

Answer 1

When you specify a dtype of '(M, N)f4' you are effectively specifying the final two dimensions of the output array, eg

np.zeros(5, np.dtype('(6, 7)f4')).shape
# (5, 6, 7)

You could achieve the same outcome by simply reading in the data as a 1D array, then reshaping it to your desired shape:

x = np.fromfile(filename, np.float32).reshape(-1, 10000, 100000)

numpy dtype ValueError: invalid shape in fixed-type tuple - how can I get around it?

Question

1 answers

solution1
2 ACCPTED 2015-10-20 16:15:27

numpy dtype ValueError: invalid shape in fixed-type tuple - how can I get around it?

Question

1 answers

solution1 2 ACCPTED 2015-10-20 16:15:27

solution1
2 ACCPTED 2015-10-20 16:15:27