简体   繁体   English

满足条件的列表中的元素序列

[英]Sequence of elements in a list satisfying a condition

Assume I have a list of this type: 假设我有这种类型的列表:

#    0   1  2  3   4  5  6  7  8  9   10  11 -- list index
li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1 ]   

I want to find each index for which the value is the same for the n following indices. 我想找到针对其的值是相同的索引n以下指数。

I can do it (laboriously) this way: 我可以这样做(费力地):

def sub_seq(li,n):
    ans={}
    for x in set(li):
        ans[x]=[i for i,e in enumerate(li[:-n+1]) if all(x==y for y in li[i:i+n])]

    ans={k:v for k,v in ans.items() if v}

    return ans

li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1] 
for i in (5,4,3,2):
    print i, sub_seq(li,i)    

Prints: 打印:

5 {1: [5]}
4 {1: [5, 6]}
3 {1: [5, 6, 7]}
2 {1: [5, 6, 7, 8], 2: [2], -1: [0, 10]}

Is there a better way to do this? 有一个更好的方法吗?

Analyzing data is typically easier if you first convert it to a convenient form. 如果您首先将数据转换为方便的形式,则通常更容易分析数据。 In this case, a run-length-encoding would be a good starting point: 在这种情况下, 运行长度编码将是一个很好的起点:

from itertools import groupby, accumulate
from collections import defaultdict

def sub_seq(li, n):
    d = defaultdict(list)
    rle = [(k, len(list(g))) for k, g in groupby(li)]
    endpoints = accumulate(size for k, size in rle)
    for end_index, (value, count) in zip(endpoints, rle):
        for index in range(end_index - count, end_index - n + 1):
            d[value].append(index)
    return dict(d)

As Raymond Hettinger points out in his answer, groupby makes easier to check consecutive values. 正如Raymond Hettinger在他的回答中指出的那样, groupby更容易检查连续值。 If you also enumerate the list, you can keep the corresponding indices and add them to the dictionary (I use defaultdict to make the function as short as possible): 如果你也枚举列表,你可以保留相应的索引并将它们添加到字典中(我使用defaultdict使函数尽可能短):

from itertools import groupby
from operator import itemgetter
from collections import defaultdict

li = [-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1]

def sub_seq(li, n):
    res = defaultdict(list)
    for k, g in groupby(enumerate(li), itemgetter(1)):
        l = list(map(itemgetter(0), g))
        if n <= len(l): res[k] += l[0:len(l)-n+1]
    return res

for i in (5,4,3,2):
    print i, sub_seq(li,i)

Which prints: 哪个印刷品:

5 defaultdict(<type 'list'>, {1: [5]})
4 defaultdict(<type 'list'>, {1: [5, 6]})
3 defaultdict(<type 'list'>, {1: [5, 6, 7]})
2 defaultdict(<type 'list'>, {1: [5, 6, 7, 8], 2: [2], -1: [0, 10]})

I personally think that this is a bit more readable, constructs less objects and I would guess runs faster. 我个人认为这更具可读性,构建更少的对象,我猜想运行得更快。

li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1 ]

results = []
i = 0
while i < len(li):
    j = i + 1
    while j < len(li) and li[i] == li[j]:
        j += 1
    results.append((i,li[i],j-i))
    i = j

print results #[(0, -1, 2), (2, 2, 2), (4, -1, 1), (5, 1, 5), (10, -1, 2)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM