简体   繁体   中英

How to convert a tuple of depth 2 to a 2D Numpy array?

The following code does not generate what I want; To convert each tuple inside a tuple to a Numpy array therefore giving me the option to retrieve the values with multiple indexes.

import numpy as np
a=np.asarray([[1,2,3],[2,3,4,5]])
print a

Output is the error:

IndexError: too many indices 

However what I want it to retrieve is 1, because the first tuples first tuples first values is one. How should I make such a conversion to happen?

Update: Interestingly when I try something like:

a=np.asarray([np.asarray([1,2,3]),np.asarray([2,3,4,5])])
b=np.asarray([np.asarray([1,2,3]),np.asarray([2,3,4,5])])
print np.multiply(a,b)

That generates the desired output! which is element by element multiplication.

[array([1, 4, 9]) array([ 4,  9, 16, 25])]

You can't convert your example directly to a NumPy array because you have differing lengths. The result you are getting is a 1d NumPy array which holds Python list objects. I've seen what you're trying to do referred to as a jagged array but not sure if that's any kind of official term.

You could pad the elements with zeros or use a sparse matrix, or simply not convert to NumPy. Depends on your overall goal.

To get you started here's how you can set up a masked array from a jagged array and compute the sum along an axis. Someone who uses this module more than me may be able to suggest something more efficient or idiomatic:

>>> a = np.array([[[1,2,3],[2,3,4,5], [2, 2]],[[3,4,5,6,7],[1],[2,3,10]]])
>>> D = max(len(x) for x in y for y in a)
>>> padded = [[x + [0] * (D-len(x)) for x in y] for y in a]
>>> mask = [[[0] * len(x) + [1] * (D-len(x)) for x in y] for y in a]
>>> result = np.ma.masked_array(padded, np.array(mask, dtype=np.bool))
>>> result
masked_array(data =
 [[[1 2 3 -- --]
  [2 3 4 5 --]
  [2 2 -- -- --]]

 [[3 4 5 6 7]
  [1 -- -- -- --]
  [2 3 10 -- --]]],
             mask =
 [[[False False False  True  True]
  [False False False False  True]
  [False False  True  True  True]]

 [[False False False False False]
  [False  True  True  True  True]
  [False False False  True  True]]],
       fill_value = 999999)

>>> np.sum(result, axis=-1)
masked_array(data =
 [[6 14 4]
 [25 1 15]],
             mask =
 [[False False False]
 [False False False]],
       fill_value = 999999)

>>> 

If I change your a and b so numpy makes a 2d array, instead of a array of arrays:

In [5]: am=np.asarray([np.asarray([1,2,3,0]),np.asarray([2,3,4,5])])
#array([[1, 2, 3, 0],
#       [2, 3, 4, 5]])
In [7]: bm=np.asarray([np.asarray([1,2,3,0]),np.asarray([2,3,4,5])])

and do timings:

In [10]: timeit np.multiply(a,b)
100000 loops, best of 3: 7.94 us per loop

In [11]: timeit np.multiply(am,bm)
100000 loops, best of 3: 1.89 us per loop

The pure ndarray multiplication is substantially faster. In one case it can jump directly into doing element by element multiplication (at the fast C code level); in the other it is doing general purpose iteration, working with objects rather than simple numbers. It is doing something close to iterating in Python.

In fact if I do that loop explicitly, I get something close to that longer time

al,bl=a.tolist(), b.tolist()
In [21]: timeit np.array([np.multiply(x,y) for x,y in zip(al,bl)])
100000 loops, best of 3: 8.99 us per loop

Now lets look at your 'sum on the last dimension' problem. Notice first that sum (or add.reduce ) has not been extended to work with this type of array.

In [37]: timeit am.sum(axis=1)
100000 loops, best of 3: 11.5 us per loop

In [38]: timeit [x.sum() for x in a]
10000 loops, best of 3: 21.5 us per loop

The speed advantage of the ndarray sum isn't as great. sum can be sped up by coding it as a dot product (with np.dot or einsum ):

In [42]: timeit np.einsum('ij->i',am)
100000 loops, best of 3: 4.79 us per loop

In [50]: ones=np.array([1,1,1,1])
In [51]: timeit np.dot(am,ones)
100000 loops, best of 3: 2.37 us per loop

In [55]: timeit [np.einsum('j->',x) for x in a]
100000 loops, best of 3: 12.3 us per loop

In [64]: c=np.asarray([np.asarray([1,1,1]),np.asarray([1,1,1,1])])   
In [65]: timeit [np.dot(x,y) for x,y in zip(a,c)]
100000 loops, best of 3: 8.12 us per loop

So while it is possible to construct ragged arrays (or array of arrays), they don't have a substantial speed advantage over lists of arrays. The fast numpy array operations do not, in general, work with elements that are general purpose Python objects ( dtype=object ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM