I am trying to save a list of numpy arrays to disk so I don't have to generate it each time as this takes a while. The list contains approx 230,000 numpy arrays, with each numpy array having dimensions of 7xlength where the length of each array can vary between ~200-800.
I have tried np.save but I get an error saying "could not broadcast input array from shape (7,158) into shape (7)" The length of the first array in the list is 158 so it is failing at the first list item. I have also tried np.savez and also first converting the list of arrays to a pure numpy array using np.asarray(listname), but I get the same error.
What is the best way to save this list of arrays to disk so I can load and use it on demand?
A list with arrays that differ in 2nd dimension:
In [118]: alist = [np.ones((2,3)), np.zeros((2,5)), np.arange(12).reshape(2,6)]
Your error:
In [119]: np.array(alist, dtype=object)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-119-357020ce4a02> in <module>
----> 1 np.array(alist, dtype=object)
ValueError: could not broadcast input array from shape (2,3) into shape (2)
Correct way of making a object array:
In [120]: arr = np.empty(3, object)
In [121]: arr[:] = alist
In [122]: arr
Out[122]:
array([array([[1., 1., 1.],
[1., 1., 1.]]),
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]),
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])], dtype=object)
save
works:
In [123]: np.save('test.npy', arr)
In [124]: ll test.npy
-rw-rw-r-- 1 paul 708 Jul 8 20:13 test.npy
savez
works, with almost the same net file size:
In [125]: np.savez('test.npz', *arr)
In [126]: ll test.npz
-rw-rw-r-- 1 paul 972 Jul 8 20:13 test.npz
Why does numpy.save produce 100MB file for sys.getsizeof 0.33MB data? is an example where the arrays differ in the first dimension.
The basic point is the np.save
writes an array; it tries to make a list input into array. An array from arrays of differing size pushes the bounds of numpy
. The latest 1.19 version starts to warn us about this.
(I deleted this answer after seeing that a comment already mentions the use of np.savez
with *yourlist
, but am undeleting it in order to provide an example of how to read the data back in again.)
import numpy as np
list1 = [np.zeros((3,3)), np.arange(5)]
np.savez("myfile.npz", *list1)
data = np.load("myfile.npz")
list2 = [data[k] for k in data]
print(list2)
gives:
[array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]]), array([0, 1, 2, 3, 4])]
Despite the somewhat dictionary-like syntax for extracting list2
from data
, data.values()
is not supported -- although data.items()
is valid, so you could also do:
list2 = [v for k, v in data.items()]
From experimentation, it appears that if you omit the .npz
suffix on np.savez
then it will be appended automatically, but if you omit the suffix on np.load
then the file will not be found.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.