简体   繁体   中英

Creating 3D numpy.ndarray with no fixed second dimension

Sometimes data, such as speech data, have a known number of observations (n), an unknown duration, and a known number of measurements (k).

In the 2D case in NumPy, it is clear how data with a known number of observations (n) and an unknown duration is represented with an ndarray of shape (n, ) . For example:

import numpy as np

x = np.array([ [ 1, 2 ],
               [ 1, 2, 3 ]
             ])

print(x.shape) ### Returns: (2, )

Is there an equivalent for the 3D case in NumPy, where we could have an ndarray of shape (n, , k) ? The best alternative to this I can think of is to have a 2D ndarray of shape (n, ) and have each element also be 2D with a (transpose) shape of (k, ) . For example,

import numpy as np

x = np.array([ [ [1,2], [1,2] ],
               [ [1,2], [1,2], [1,2] ]
             ])

print(x.shape) ### Returns: (2, ); Desired: (2, , 2)

Ideally, a solution would be able to tell us the dimensionality properties of an ndarray without the need for a recursive call (maybe with an alternative to shape ?).

You seem to have misunderstood what a shape of (2,) means. It doesn't mean (2, <unknown>) ; the comma is not a separator between 2 and some sort of blank dimension. (2,) is the Python syntax for a one-element tuple whose one element is 2 . Python uses this syntax because (2) would mean the integer 2 , not a tuple.

You are not creating a two-dimensional array with an arbitrary-length second dimension. You are creating a one-dimensional array of object dtype. Its elements are ordinary Python lists. An array like this is incompatible with almost every useful thing in NumPy.

There is no way to create NumPy arrays with variable-length dimensions, whether in the 2D case you thought worked, or in the 3D case you're trying to make work.

Just to review the 1d case:

In [33]: x = np.array([[1,2],[1,2,3]])                                          
In [34]: x.shape                                                                
Out[34]: (2,)
In [35]: x                                                                      
Out[35]: array([list([1, 2]), list([1, 2, 3])], dtype=object)

The result is a 2 element array of lists, where as we started with a list of lists. Not much difference.

But note that if the lists are same size, np.array creates a numeric 2d array:

In [36]: x = np.array([[1,2,4],[1,2,3]])                                        
In [37]: x                                                                      
Out[37]: 
array([[1, 2, 4],
       [1, 2, 3]])

So don't count on the behavior we see in [33].

I could create a 2d object array:

In [59]: x = np.empty((2,2),object)                                             
In [60]: x                                                                      
Out[60]: 
array([[None, None],                  # in this case filled with None
       [None, None]], dtype=object)

I can assign each element with a different kind and size of object:

In [61]: x[0,0] = np.arange(3)                                                  
In [62]: x[0,0] = [1,2,3]                                                       
In [63]: x[1,0] = 'abc'                                                         
In [64]: x[1,1] = np.arange(6).reshape(2,3)                                     
In [65]: x                                                                      
Out[65]: 
array([[list([1, 2, 3]), None],
       ['abc', array([[0, 1, 2],
       [3, 4, 5]])]], dtype=object)

It is still 2d. For most purposes it is like a list or list of lists, containing objects. The databuffer actually has pointers to objects stored else where in memory (just as list buffer does).

There really isn't such a thing as a 3d array with a variable last dimension. At best we can get a 2d array that contains lists or arrays of various sizes.


Make a list of 2 2d arrays:

In [69]: alist = [np.arange(6).reshape(2,3), np.arange(4.).reshape(2,2)]        
In [70]: alist                                                                  
Out[70]: 
[array([[0, 1, 2],
        [3, 4, 5]]), array([[0., 1.],
        [2., 3.]])]

In this case, giving it to np.array raises an error: In [71]: np.array(alist)
--------------------------------------------------------------------------- ValueError: could not broadcast input array from shape (2,3) into shape (2)

We could fill an object array with elements from this list:

In [72]: x = np.empty((4,),object)                                              
In [73]: x[0]=alist[0][0]                                                       
In [74]: x[1]=alist[0][1]                                                       
In [75]: x[2]=alist[1][0]                                                       
In [76]: x[3]=alist[1][1]                                                       
In [77]: x                                                                      
Out[77]: 
array([array([0, 1, 2]), array([3, 4, 5]), array([0., 1.]),
       array([2., 3.])], dtype=object)

and reshape it to 2d

In [78]: x.reshape(2,2)                                                         
Out[78]: 
array([[array([0, 1, 2]), array([3, 4, 5])],
       [array([0., 1.]), array([2., 3.])]], dtype=object)

Result is a 2d array containing 1d arrays. To get the shapes of the elements I have to do something like:

In [87]: np.frompyfunc(lambda i:i.shape, 1,1)(Out[78])                          
Out[87]: 
array([[(3,), (3,)],
       [(2,), (2,)]], dtype=object)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM