The following code does not generate what I want; To convert each tuple inside a tuple to a Numpy array therefore giving me the option to retrieve the values with multiple indexes.
import numpy as np
a=np.asarray([[1,2,3],[2,3,4,5]])
print a
Output is the error:
IndexError: too many indices
However what I want it to retrieve is 1, because the first tuples first tuples first values is one. How should I make such a conversion to happen?
Update: Interestingly when I try something like:
a=np.asarray([np.asarray([1,2,3]),np.asarray([2,3,4,5])])
b=np.asarray([np.asarray([1,2,3]),np.asarray([2,3,4,5])])
print np.multiply(a,b)
That generates the desired output! which is element by element multiplication.
[array([1, 4, 9]) array([ 4, 9, 16, 25])]
You can't convert your example directly to a NumPy array because you have differing lengths. The result you are getting is a 1d NumPy array which holds Python list objects. I've seen what you're trying to do referred to as a jagged array but not sure if that's any kind of official term.
You could pad the elements with zeros or use a sparse matrix, or simply not convert to NumPy. Depends on your overall goal.
To get you started here's how you can set up a masked array from a jagged array and compute the sum along an axis. Someone who uses this module more than me may be able to suggest something more efficient or idiomatic:
>>> a = np.array([[[1,2,3],[2,3,4,5], [2, 2]],[[3,4,5,6,7],[1],[2,3,10]]])
>>> D = max(len(x) for x in y for y in a)
>>> padded = [[x + [0] * (D-len(x)) for x in y] for y in a]
>>> mask = [[[0] * len(x) + [1] * (D-len(x)) for x in y] for y in a]
>>> result = np.ma.masked_array(padded, np.array(mask, dtype=np.bool))
>>> result
masked_array(data =
[[[1 2 3 -- --]
[2 3 4 5 --]
[2 2 -- -- --]]
[[3 4 5 6 7]
[1 -- -- -- --]
[2 3 10 -- --]]],
mask =
[[[False False False True True]
[False False False False True]
[False False True True True]]
[[False False False False False]
[False True True True True]
[False False False True True]]],
fill_value = 999999)
>>> np.sum(result, axis=-1)
masked_array(data =
[[6 14 4]
[25 1 15]],
mask =
[[False False False]
[False False False]],
fill_value = 999999)
>>>
If I change your a
and b
so numpy makes a 2d array, instead of a array of arrays:
In [5]: am=np.asarray([np.asarray([1,2,3,0]),np.asarray([2,3,4,5])])
#array([[1, 2, 3, 0],
# [2, 3, 4, 5]])
In [7]: bm=np.asarray([np.asarray([1,2,3,0]),np.asarray([2,3,4,5])])
and do timings:
In [10]: timeit np.multiply(a,b)
100000 loops, best of 3: 7.94 us per loop
In [11]: timeit np.multiply(am,bm)
100000 loops, best of 3: 1.89 us per loop
The pure ndarray multiplication is substantially faster. In one case it can jump directly into doing element by element multiplication (at the fast C
code level); in the other it is doing general purpose iteration, working with objects rather than simple numbers. It is doing something close to iterating in Python.
In fact if I do that loop explicitly, I get something close to that longer time
al,bl=a.tolist(), b.tolist()
In [21]: timeit np.array([np.multiply(x,y) for x,y in zip(al,bl)])
100000 loops, best of 3: 8.99 us per loop
Now lets look at your 'sum on the last dimension' problem. Notice first that sum
(or add.reduce
) has not been extended to work with this type of array.
In [37]: timeit am.sum(axis=1)
100000 loops, best of 3: 11.5 us per loop
In [38]: timeit [x.sum() for x in a]
10000 loops, best of 3: 21.5 us per loop
The speed advantage of the ndarray sum
isn't as great. sum
can be sped up by coding it as a dot
product (with np.dot
or einsum
):
In [42]: timeit np.einsum('ij->i',am)
100000 loops, best of 3: 4.79 us per loop
In [50]: ones=np.array([1,1,1,1])
In [51]: timeit np.dot(am,ones)
100000 loops, best of 3: 2.37 us per loop
In [55]: timeit [np.einsum('j->',x) for x in a]
100000 loops, best of 3: 12.3 us per loop
In [64]: c=np.asarray([np.asarray([1,1,1]),np.asarray([1,1,1,1])])
In [65]: timeit [np.dot(x,y) for x,y in zip(a,c)]
100000 loops, best of 3: 8.12 us per loop
So while it is possible to construct ragged arrays (or array of arrays), they don't have a substantial speed advantage over lists of arrays. The fast numpy
array operations do not, in general, work with elements that are general purpose Python objects ( dtype=object
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.