简体   繁体   English

查找列表中重复序列索引的有效方法?

[英]Efficient way to find the index of repeated sequence in a list?

I have a large list of numbers in python, and I want to write a function that finds sections of the list where the same number is repeated more than n times. 我在python中有一个很大的数字列表,我想编写一个函数来查找列表中的相同数字重复n次以上的部分。 For example, if n is 3 then my function should return the following results for the following examples: 例如,如果n为3,则对于以下示例,我的函数应返回以下结果:

When applied to example = [1,2,1,1,1,1,2,3] the function should return [(2,6)], because example[2:6] is a sequence containing all the same value. 当应用于example = [1,2,1,1,1,1,2,3]时,该函数应返回[(2,6)],因为example [2:6]是包含所有相同值的序列。

When applied to example = [0,0,0,7,3,2,2,2,2,1] the function should return [(0,3), (5,9)] because both example[0:3] and example[5:9] contain repeated sequences of the same value. 当应用于example = [0,0,0,7,3,2,2,2,2,1]时,该函数应返回[(0,3),(5,9)],因为两个example [0:3 ]和example [5:9]包含相同值的重复序列。

When applied to example = [1,2,1,2,1,2,1,2,1,2] the function should return [] because there is no sequence of three or more elements that are all the same number. 当应用于example = [1,2,1,2,1,2,1,2,1,2]时,该函数应返回[],因为不存在三个或三个以上相同编号元素的序列。

I know I could write a bunch of loops to get what I want, but that seems kind of inefficient, and I was wondering if there was an easier option to obtain what I wanted. 我知道我可以编写一堆循环来获取所需的内容,但这似乎效率不高,我想知道是否有更简单的选择来获取所需的内容。

Use itertools.groupby and enumerate : 使用itertools.groupbyenumerate

>>> from itertools import groupby
>>> n = 3
>>> x = [1,2,1,1,1,1,2,3] 
>>> grouped = (list(g) for _,g in groupby(enumerate(x), lambda t:t[1]))
>>> [(g[0][0], g[-1][0] + 1) for g in grouped if len(g) >= n]
[(2, 6)]
>>> x = [0,0,0,7,3,2,2,2,2,1]
>>> grouped = (list(g) for _,g in groupby(enumerate(x), lambda t:t[1]))
>>> [(g[0][0], g[-1][0] + 1) for g in grouped if len(g) >= n]
[(0, 3), (5, 9)]

To understand groupby: just realize that each iteration returns the value of the key, which is used to group the elements of the iterable, along with a new lazy-iterable that will iterate over the group. 要了解groupby:只需意识到每次迭代都会返回键的值(该键用于对iterable的元素进行分组),以及一个新的lazy-iterable(将在组中进行迭代)。

>>> list(groupby(enumerate(x), lambda t:t[1]))
[(0, <itertools._grouper object at 0x7fc90a707bd0>), (7, <itertools._grouper object at 0x7fc90a707ad0>), (3, <itertools._grouper object at 0x7fc90a707950>), (2, <itertools._grouper object at 0x7fc90a707c10>), (1, <itertools._grouper object at 0x7fc90a707c50>)]

You can do this in a single loop by following the current algorithm: 您可以按照当前算法在单个循环中执行此操作:

def find_pairs (array, n):
    result_pairs = []
    prev = idx = 0
    count = 1
    for i in range (0, len(array)):
        if(i > 0):
            if(array[i] == prev):
                count += 1
            else:
                if(count >= n):
                    result_pairs.append((idx, i))
                else:
                    prev = array[i]
                    idx = i
                count = 1
        else:
            prev = array[i]
            idx = i
    return result_pairs

And you call the function like this: find_pairs(list, n) . 然后调用这样的函数: find_pairs(list, n) The is the most efficient way you can perform this task, as it has complexity O(len(array)). 这是执行此任务的最有效方法,因为它具有复杂度O(len(array))。 I think is pretty simple to understand, but if you have any doubts just ask. 我认为这很容易理解,但是如果您有任何疑问,请提出。

You could use this. 您可以使用它。 Note that your question is ambiguous as to the role of n. 请注意,关于n的作用,您的问题模棱两可。 I assume here that a series of n equal values should be matched. 我在这里假设应该匹配一系列n个相等的值。 If it should have at least n+1 values, then replace >= by > : 如果它至少应具有n + 1个值,则将>=替换为>

def monotoneRanges(a, n):
    idx = [i for i, v in enumerate(a) if not i or a[i-1] != v] + [len(a)]
    return [r for r in zip(idx, idx[1:]) if r[1] >= r[0]+n]

# example call
res = monotoneRanges([0,0,0,7,3,2,2,2,2,1], 3)

print(res)

Outputs: 输出:

[(0, 3), (5, 9)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM