在大型numpy数组中查找常量子数组

Question

I have a numpy float array like 我有一个像numpy浮点数组

v = np.array([1.0,1.0,2.0,2.0,2.0,2.0,...])

I would need to identify all the constant segments in the array like 我需要识别数组中的所有常量段

[{value:1.0,location:0,duration:2},..]

Efficiency is the main metric 效率是主要指标

Answer 1

Here's one approach - 这是一种方法 -

def island_props(v):
    # Get one-off shifted slices and then compare element-wise, to give
    # us a mask of start and start positions for each island.
    # Also, get the corresponding indices.
    mask = np.concatenate(( [True], v[1:] != v[:-1], [True] ))
    loc0 = np.flatnonzero(mask)

    # Get the start locations
    loc = loc0[:-1]

    # The values would be input array indexe by the start locations.
    # The lengths woul be the differentiation between start and stop indices.
    return v[loc], loc, np.diff(loc0)

Sample run - 样品运行 -

In [143]: v
Out[143]: array([ 1.,  1.,  2.,  2.,  2.,  2.,  5.,  2.])

In [144]: value, location, lengths = island_props(v)

In [145]: value
Out[145]: array([ 1.,  2.,  5.,  2.])

In [146]: location
Out[146]: array([0, 2, 6, 7])

In [147]: lengths
Out[147]: array([2, 4, 1, 1])

Runtime test 运行时测试

Other approaches - 其他方法 -

import itertools
def MSeifert(a):
    return [{'value': k, 'duration': len(list(v))} for k, v in 
             itertools.groupby(a.tolist())]

def Kasramvd(a):
    return np.split(v, np.where(np.diff(v) != 0)[0] + 1)

Timings - 计时 -

In [156]: v0 = np.array([1.0,1.0,2.0,2.0,2.0,2.0,5.0,2.0])

In [157]: v = np.tile(v0,10000)

In [158]: %timeit MSeifert(v)
     ...: %timeit Kasramvd(v)
     ...: %timeit island_props(v)
     ...: 
10 loops, best of 3: 44.7 ms per loop
10 loops, best of 3: 36.1 ms per loop
10000 loops, best of 3: 140 µs per loop

Answer 2

You can group the equal items as following then simply do the rest by getting the size of the array, first element and the index: 您可以按如下方式对相同的项进行分组，然后通过获取数组的大小，第一个元素和索引来完成剩下的工作：

In [2]: v = np.array([1.0,1.0,2.0,2.0,2.0,2.0,3.0, 3.0, 5.0, 6.0, 6.0])

In [4]: np.split(v, np.where(np.diff(v) != 0)[0] + 1)
Out[4]: 
[array([ 1.,  1.]),
 array([ 2.,  2.,  2.,  2.]),
 array([ 3.,  3.]),
 array([ 5.]),
 array([ 6.,  6.,  6.])]

The equation np.diff(v) != 0 denotes the places of where the sequence changes (the difference is not 0) and np.where() gives you the respective indices of those places (from the boolean result). 方程式np.diff(v) != 0表示序列变化的位置（差值不为0）， np.where()给出了这些位置的相应索引（来自布尔结果）。 Then you can simply split the array using np.split() . 然后你可以使用np.split()简单地拆分数组。

And finally you can use a list comprehension to get the desire result: 最后，您可以使用列表理解来获得期望结果：

In [7]: locations = np.where(np.diff(v) != 0)[0] + 1

In [8]: result = np.split(v, locations)

In [9]: [{'value':arr[0], 'location':loc, 'duration':arr.size} for loc, arr in zip(locations, result)]
Out[9]: 
[{'duration': 2, 'value': 1.0, 'location': 2},
 {'duration': 4, 'value': 2.0, 'location': 6},
 {'duration': 2, 'value': 3.0, 'location': 8},
 {'duration': 1, 'value': 5.0, 'location': 9}]

Answer 3

You could use itertools.groupby , it could be a bit slower (haven't timed it) but probably a lot easier to understand: 你可以使用itertools.groupby ，它可能会慢一点（没有定时）但可能更容易理解：

>>> import numpy as np
>>> import itertools
>>> a = np.array([1.0,1.0,2.0,2.0,2.0,2.0])
>>> [{'value': k, 'duration': len(list(v))} for k, v in itertools.groupby(a.tolist())]
[{'duration': 2, 'value': 1.0}, {'duration': 4, 'value': 2.0}]

在大型numpy数组中查找常量子数组

问题描述

3 个解决方案

解决方案1
4 已采纳 2017-09-30 11:40:08

解决方案2
2 2017-09-30 11:37:06

解决方案3
2 2017-09-30 11:42:15

在大型numpy数组中查找常量子数组

问题描述

3 个解决方案

解决方案1 4 已采纳 2017-09-30 11:40:08

解决方案2 2 2017-09-30 11:37:06

解决方案3 2 2017-09-30 11:42:15

解决方案1
4 已采纳 2017-09-30 11:40:08

解决方案2
2 2017-09-30 11:37:06

解决方案3
2 2017-09-30 11:42:15