简体   繁体   English

如何将numpy数组中的相同元素移动到子数组中

[英]how to move identical elements in numpy array into subarrays

How do I efficiently move identical elements from a sorted numpy array into subarrays? 如何有效地将相同元素从已排序的numpy数组移到子数组?

from here: 从这里:

import numpy as np     
a=np.array([0,0,1,1,1,3,5,5,5])

to here: 到这里:

a2=array([[0, 0], [1, 1, 1], [3], [5, 5, 5]], dtype=object)

One approach would be to get the places of shifts, where the numbers change and use those indices to split the input array into subarrays. 一种方法是获取移位的位置,在此数字发生变化,并使用这些索引将输入数组拆分为子数组。 For finding those indices, you can use np.nonzero on a differentiated array and then use np.split for splitting, like so - 为了找到这些索引,您可以使用np.nonzero差异化阵列上,然后用np.split一个分裂,就像这样-

np.split(a,np.nonzero(np.diff(a))[0]+1)

Sample run - 样品运行-

In [42]: a
Out[42]: array([2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 6, 6, 6])

In [43]: np.split(a,np.nonzero(np.diff(a))[0]+1)
Out[43]: 
[array([2, 2, 2, 2]),
 array([3, 3, 3, 3]),
 array([4, 4, 4, 4, 4, 4, 4]),
 array([5, 5]),
 array([6, 6, 6])]

One method to do this would be using itertools.groupby . 一种方法是使用itertools.groupby Example - 范例-

result = np.array([list(g) for _,g in groupby(a)])

This would work for normal sorted lists as well, not just numpy arrays. 这也适用于普通排序列表,而不仅仅是numpy数组。

Demo - 演示-

In [24]: import numpy as np

In [25]: a=np.array([0,0,1,1,1,3,5,5,5])

In [26]: from itertools import groupby

In [27]: result = np.array([list(g) for _,g in groupby(a)])

In [28]: result
Out[28]: array([[0, 0], [1, 1, 1], [3], [5, 5, 5]], dtype=object)

Timing comparison with the other approach - 与其他方法的时间比较-

In [29]: %timeit np.array([list(g) for _,g in groupby(a)])
The slowest run took 6.10 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.86 µs per loop

In [30]: %timeit np.split(a,np.where(np.diff(a)>0)[0]+1)
10000 loops, best of 3: 29.2 µs per loop

In [31]: %timeit np.array([list(g) for _,g in groupby(a)])
100000 loops, best of 3: 10.5 µs per loop

In [33]: %timeit np.split(a,np.nonzero(np.diff(a))[0]+1)
The slowest run took 4.32 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 25.2 µs per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM