查找列表中重复序列索引的有效方法？

Question

I have a large list of numbers in python, and I want to write a function that finds sections of the list where the same number is repeated more than n times. 我在python中有一个很大的数字列表，我想编写一个函数来查找列表中的相同数字重复n次以上的部分。 For example, if n is 3 then my function should return the following results for the following examples: 例如，如果n为3，则对于以下示例，我的函数应返回以下结果：

When applied to example = [1,2,1,1,1,1,2,3] the function should return [(2,6)], because example[2:6] is a sequence containing all the same value. 当应用于example = [1,2,1,1,1,1,2,3]时，该函数应返回[（2,6）]，因为example [2：6]是包含所有相同值的序列。

When applied to example = [0,0,0,7,3,2,2,2,2,1] the function should return [(0,3), (5,9)] because both example[0:3] and example[5:9] contain repeated sequences of the same value. 当应用于example = [0,0,0,7,3,2,2,2,2,1]时，该函数应返回[（0,3），（5,9）]，因为两个example [0：3 ]和example [5：9]包含相同值的重复序列。

When applied to example = [1,2,1,2,1,2,1,2,1,2] the function should return [] because there is no sequence of three or more elements that are all the same number. 当应用于example = [1,2,1,2,1,2,1,2,1,2]时，该函数应返回[]，因为不存在三个或三个以上相同编号元素的序列。

I know I could write a bunch of loops to get what I want, but that seems kind of inefficient, and I was wondering if there was an easier option to obtain what I wanted. 我知道我可以编写一堆循环来获取所需的内容，但这似乎效率不高，我想知道是否有更简单的选择来获取所需的内容。

Answer 1

Use itertools.groupby and enumerate : 使用itertools.groupby并enumerate ：

>>> from itertools import groupby
>>> n = 3
>>> x = [1,2,1,1,1,1,2,3] 
>>> grouped = (list(g) for _,g in groupby(enumerate(x), lambda t:t[1]))
>>> [(g[0][0], g[-1][0] + 1) for g in grouped if len(g) >= n]
[(2, 6)]
>>> x = [0,0,0,7,3,2,2,2,2,1]
>>> grouped = (list(g) for _,g in groupby(enumerate(x), lambda t:t[1]))
>>> [(g[0][0], g[-1][0] + 1) for g in grouped if len(g) >= n]
[(0, 3), (5, 9)]

To understand groupby: just realize that each iteration returns the value of the key, which is used to group the elements of the iterable, along with a new lazy-iterable that will iterate over the group. 要了解groupby：只需意识到每次迭代都会返回键的值（该键用于对iterable的元素进行分组），以及一个新的lazy-iterable（将在组中进行迭代）。

>>> list(groupby(enumerate(x), lambda t:t[1]))
[(0, <itertools._grouper object at 0x7fc90a707bd0>), (7, <itertools._grouper object at 0x7fc90a707ad0>), (3, <itertools._grouper object at 0x7fc90a707950>), (2, <itertools._grouper object at 0x7fc90a707c10>), (1, <itertools._grouper object at 0x7fc90a707c50>)]

Answer 2

You can do this in a single loop by following the current algorithm: 您可以按照当前算法在单个循环中执行此操作：

def find_pairs (array, n):
    result_pairs = []
    prev = idx = 0
    count = 1
    for i in range (0, len(array)):
        if(i > 0):
            if(array[i] == prev):
                count += 1
            else:
                if(count >= n):
                    result_pairs.append((idx, i))
                else:
                    prev = array[i]
                    idx = i
                count = 1
        else:
            prev = array[i]
            idx = i
    return result_pairs

And you call the function like this: find_pairs(list, n) . 然后调用这样的函数： find_pairs(list, n) 。 The is the most efficient way you can perform this task, as it has complexity O(len(array)). 这是执行此任务的最有效方法，因为它具有复杂度O（len（array））。 I think is pretty simple to understand, but if you have any doubts just ask. 我认为这很容易理解，但是如果您有任何疑问，请提出。

Answer 3

You could use this. 您可以使用它。 Note that your question is ambiguous as to the role of n. 请注意，关于n的作用，您的问题模棱两可。 I assume here that a series of n equal values should be matched. 我在这里假设应该匹配一系列n个相等的值。 If it should have at least n+1 values, then replace >= by > : 如果它至少应具有n + 1个值，则将>=替换为> ：

def monotoneRanges(a, n):
    idx = [i for i, v in enumerate(a) if not i or a[i-1] != v] + [len(a)]
    return [r for r in zip(idx, idx[1:]) if r[1] >= r[0]+n]

# example call
res = monotoneRanges([0,0,0,7,3,2,2,2,2,1], 3)

print(res)

Outputs: 输出：

[(0, 3), (5, 9)]

查找列表中重复序列索引的有效方法？

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-10-12 19:12:22

解决方案2
1 2016-10-12 19:28:23

解决方案3
0 2016-10-12 19:58:35

查找列表中重复序列索引的有效方法？

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-10-12 19:12:22

解决方案2 1 2016-10-12 19:28:23

解决方案3 0 2016-10-12 19:58:35

解决方案1
2 已采纳 2016-10-12 19:12:22

解决方案2
1 2016-10-12 19:28:23

解决方案3
0 2016-10-12 19:58:35