简体   繁体   中英

Python: How to save lists of 2D numpy arrays of different lengths

I am trying to save a list of numpy arrays to disk so I don't have to generate it each time as this takes a while. The list contains approx 230,000 numpy arrays, with each numpy array having dimensions of 7xlength where the length of each array can vary between ~200-800.

I have tried np.save but I get an error saying "could not broadcast input array from shape (7,158) into shape (7)" The length of the first array in the list is 158 so it is failing at the first list item. I have also tried np.savez and also first converting the list of arrays to a pure numpy array using np.asarray(listname), but I get the same error.

What is the best way to save this list of arrays to disk so I can load and use it on demand?

A list with arrays that differ in 2nd dimension:

In [118]: alist = [np.ones((2,3)), np.zeros((2,5)), np.arange(12).reshape(2,6)]                      

Your error:

In [119]: np.array(alist, dtype=object)                                                              
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-119-357020ce4a02> in <module>
----> 1 np.array(alist, dtype=object)

ValueError: could not broadcast input array from shape (2,3) into shape (2)

Correct way of making a object array:

In [120]: arr = np.empty(3, object)                                                                  
In [121]: arr[:] = alist                                                                             
In [122]: arr                                                                                        
Out[122]: 
array([array([[1., 1., 1.],
       [1., 1., 1.]]),
       array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]]),
       array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])], dtype=object)

save works:

In [123]: np.save('test.npy', arr)                                                                   
In [124]: ll test.npy                                                                                
-rw-rw-r-- 1 paul 708 Jul  8 20:13 test.npy

savez works, with almost the same net file size:

In [125]: np.savez('test.npz', *arr)                                                                 
In [126]: ll test.npz                                                                                
-rw-rw-r-- 1 paul 972 Jul  8 20:13 test.npz

Why does numpy.save produce 100MB file for sys.getsizeof 0.33MB data? is an example where the arrays differ in the first dimension.

The basic point is the np.save writes an array; it tries to make a list input into array. An array from arrays of differing size pushes the bounds of numpy . The latest 1.19 version starts to warn us about this.

(I deleted this answer after seeing that a comment already mentions the use of np.savez with *yourlist , but am undeleting it in order to provide an example of how to read the data back in again.)

import numpy as np

list1 = [np.zeros((3,3)), np.arange(5)]

np.savez("myfile.npz", *list1)

data = np.load("myfile.npz")

list2 = [data[k] for k in data]

print(list2)

gives:

[array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]]), array([0, 1, 2, 3, 4])]

Despite the somewhat dictionary-like syntax for extracting list2 from data , data.values() is not supported -- although data.items() is valid, so you could also do:

list2 = [v for k, v in data.items()]

From experimentation, it appears that if you omit the .npz suffix on np.savez then it will be appended automatically, but if you omit the suffix on np.load then the file will not be found.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM