简体   繁体   English

在大型numpy数组中查找常量子数组

[英]Find constant subarrays in large numpy array

I have a numpy float array like 我有一个像numpy浮点数组

v = np.array([1.0,1.0,2.0,2.0,2.0,2.0,...])

I would need to identify all the constant segments in the array like 我需要识别数组中的所有常量段

[{value:1.0,location:0,duration:2},..]

Efficiency is the main metric 效率是主要指标

Here's one approach - 这是一种方法 -

def island_props(v):
    # Get one-off shifted slices and then compare element-wise, to give
    # us a mask of start and start positions for each island.
    # Also, get the corresponding indices.
    mask = np.concatenate(( [True], v[1:] != v[:-1], [True] ))
    loc0 = np.flatnonzero(mask)

    # Get the start locations
    loc = loc0[:-1]

    # The values would be input array indexe by the start locations.
    # The lengths woul be the differentiation between start and stop indices.
    return v[loc], loc, np.diff(loc0)

Sample run - 样品运行 -

In [143]: v
Out[143]: array([ 1.,  1.,  2.,  2.,  2.,  2.,  5.,  2.])

In [144]: value, location, lengths = island_props(v)

In [145]: value
Out[145]: array([ 1.,  2.,  5.,  2.])

In [146]: location
Out[146]: array([0, 2, 6, 7])

In [147]: lengths
Out[147]: array([2, 4, 1, 1])

Runtime test 运行时测试

Other approaches - 其他方法 -

import itertools
def MSeifert(a):
    return [{'value': k, 'duration': len(list(v))} for k, v in 
             itertools.groupby(a.tolist())]

def Kasramvd(a):
    return np.split(v, np.where(np.diff(v) != 0)[0] + 1)

Timings - 计时 -

In [156]: v0 = np.array([1.0,1.0,2.0,2.0,2.0,2.0,5.0,2.0])

In [157]: v = np.tile(v0,10000)

In [158]: %timeit MSeifert(v)
     ...: %timeit Kasramvd(v)
     ...: %timeit island_props(v)
     ...: 
10 loops, best of 3: 44.7 ms per loop
10 loops, best of 3: 36.1 ms per loop
10000 loops, best of 3: 140 µs per loop

You can group the equal items as following then simply do the rest by getting the size of the array, first element and the index: 您可以按如下方式对相同的项进行分组,然后通过获取数组的大小,第一个元素和索引来完成剩下的工作:

In [2]: v = np.array([1.0,1.0,2.0,2.0,2.0,2.0,3.0, 3.0, 5.0, 6.0, 6.0])

In [4]: np.split(v, np.where(np.diff(v) != 0)[0] + 1)
Out[4]: 
[array([ 1.,  1.]),
 array([ 2.,  2.,  2.,  2.]),
 array([ 3.,  3.]),
 array([ 5.]),
 array([ 6.,  6.,  6.])]

The equation np.diff(v) != 0 denotes the places of where the sequence changes (the difference is not 0) and np.where() gives you the respective indices of those places (from the boolean result). 方程式np.diff(v) != 0表示序列变化的位置(差值不为0), np.where()给出了这些位置的相应索引(来自布尔结果)。 Then you can simply split the array using np.split() . 然后你可以使用np.split()简单地拆分数组。

And finally you can use a list comprehension to get the desire result: 最后,您可以使用列表理解来获得期望结果:

In [7]: locations = np.where(np.diff(v) != 0)[0] + 1

In [8]: result = np.split(v, locations)

In [9]: [{'value':arr[0], 'location':loc, 'duration':arr.size} for loc, arr in zip(locations, result)]
Out[9]: 
[{'duration': 2, 'value': 1.0, 'location': 2},
 {'duration': 4, 'value': 2.0, 'location': 6},
 {'duration': 2, 'value': 3.0, 'location': 8},
 {'duration': 1, 'value': 5.0, 'location': 9}]

You could use itertools.groupby , it could be a bit slower (haven't timed it) but probably a lot easier to understand: 你可以使用itertools.groupby ,它可能会慢一点(没有定时)但可能更容易理解:

>>> import numpy as np
>>> import itertools
>>> a = np.array([1.0,1.0,2.0,2.0,2.0,2.0])
>>> [{'value': k, 'duration': len(list(v))} for k, v in itertools.groupby(a.tolist())]
[{'duration': 2, 'value': 1.0}, {'duration': 4, 'value': 2.0}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM