So for a matrix, we have methods like numpy.flatten()
np.array([[1,2,3],[4,5,6],[7,8,9]]).flatten()
gives [1,2,3,4,5,6,7,8,9]
what if I wanted to get from np.array([[1,2,3],[4,5,6],7])
to [1,2,3,4,5,6,7]
?
Is there an existing function that performs something like that?
With uneven lists, the array is a object dtype, (and 1d, so flatten doesn't change it)
In [96]: arr=np.array([[1,2,3],[4,5,6],7])
In [97]: arr
Out[97]: array([[1, 2, 3], [4, 5, 6], 7], dtype=object)
In [98]: arr.sum()
...
TypeError: can only concatenate list (not "int") to list
The 7
element is giving problems. If I change that to a list:
In [99]: arr=np.array([[1,2,3],[4,5,6],[7]])
In [100]: arr.sum()
Out[100]: [1, 2, 3, 4, 5, 6, 7]
I'm using a trick here. The elements of the array lists, and for lists [1,2,3]+[4,5]
is concatenate.
The basic point is that an object array is not a 2d array. It is, in many ways, more like a list of lists.
The best list flattener is chain
In [104]: list(itertools.chain(*arr))
Out[104]: [1, 2, 3, 4, 5, 6, 7]
though it too will choke on the integer 7 version.
If the array is a list of lists (not the original mix of lists and scalar) then np.concatenate
works. It iterates on the object just as though it were a list.
With the mixed original list concatenate
does not work, but hstack
does
In [178]: arr=np.array([[1,2,3],[4,5,6],7])
In [179]: np.concatenate(arr)
...
ValueError: all the input arrays must have same number of dimensions
In [180]: np.hstack(arr)
Out[180]: array([1, 2, 3, 4, 5, 6, 7])
That's because hstack
first iterates though the list and makes sure all elements are atleast_1d
. This extra iteration makes it more robust, but at a cost in processing speed.
In [170]: big1=arr.repeat(1000)
In [171]: timeit big1.sum()
10 loops, best of 3: 31.6 ms per loop
In [172]: timeit list(itertools.chain(*big1))
1000 loops, best of 3: 433 µs per loop
In [173]: timeit np.concatenate(big1)
100 loops, best of 3: 5.05 ms per loop
double the size
In [174]: big1=arr.repeat(2000)
In [175]: timeit big1.sum()
10 loops, best of 3: 128 ms per loop
In [176]: timeit list(itertools.chain(*big1))
1000 loops, best of 3: 803 µs per loop
In [177]: timeit np.concatenate(big1)
100 loops, best of 3: 9.93 ms per loop
In [182]: timeit np.hstack(big1) # the extra iteration hurts hstack speed
10 loops, best of 3: 43.1 ms per loop
The sum
is quadratic in size
res=[]
for e in bigarr:
res += e
res
grows with the number of e, so each iteration step is more expensive.
chain
times the best.
You can write custom flatten function using yield:
def flatten(arr):
for i in arr:
try:
yield from flatten(i)
except TypeError:
yield i
Usage example:
>>> myarr = np.array([[1,2,3],[4,5,6],7])
>>> newarr = list(flatten(myarr))
>>> newarr
[1, 2, 3, 4, 5, 6, 7]
You can use apply_along_axis
here
>>> arr = np.array([[1,2,3],[4,5,6],[7]])
>>> np.apply_along_axis(np.concatenate, 0, arr)
array([1, 2, 3, 4, 5, 6, 7])
As a bonus, this is not quadratic in the number of lists either.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.