简体   繁体   English

最长连续子序列的Python方法

[英]Pythonic way for longest contiguous subsequence

I have a sorted list of integers in a list called "black" and I'm looking for an elegant way to get start "s" and end "e" of the longest contiguous subsequence (the original problem had black pixels in a wxh-bitmap and I look for the longest line in a given column x). 我在名为“ black”的列表中有一个完整的整数列表,并且我正在寻找一种优雅的方式来获得最长连续子序列的“ s”和“ e”开头(原始问题是wxh-位图,我在给定的列x中寻找最长的线。 My solution works but looks ugly: 我的解决方案可行,但看起来很丑:

# blacks is a list of integers generated from a bitmap this way:
# blacks= [y for y in range(h) if bits[y*w+x]==1]

longest=(0,0)
s=blacks[0]
e=s-1
for i in blacks:
    if e+1 == i:   # Contiguous?
        e=i
    else:
        if e-s > longest[1]-longest[0]:
            longest = (s,e)
        s=e=i
if e-s > longest[1]-longest[0]:
    longest = (s,e)
print longest 

I feel that this could be done in a smart one or two-liner 我觉得这可以用一个或两个聪明的班轮完成

You could do the following, using itertools.groupby and itertools.chain : 您可以使用itertools.groupbyitertools.chain进行以下操作:

from itertools import groupby, chain
l = [1, 2, 5, 6, 7, 8, 10, 11, 12]
f = lambda x: x[1] - x[0] == 1  # key function to identify proper neighbours

The following is still almost readable ;-) and gets you a decent intermediate step from which to proceed in a more sensible manner would probably be a valid option: 以下内容仍是几乎可读的;-),并为您提供了一个不错的中间步骤,以更明智的方式进行操作可能是有效的选择:

max((list(g) for k, g in groupby(zip(l, l[1:]), key=f) if k), key=len)
# [(5, 6), (6, 7), (7, 8)]

In order to extract the actaul desired sequence [5, 6, 7, 8] in one line, you have to use some more kung-fu: 为了在一行中提取实际的所需序列[5, 6, 7, 8] 5、6、7、8 [5, 6, 7, 8] ,您必须使用更多的功夫:

sorted(set(chain(*max((list(g) for k, g in groupby(zip(l, l[1:]), key=f) if k), key=len))))
# [5, 6, 7, 8]

I shall leave it to you to work out the internals of this monstrosity :-) but keep in mind: a one-liner is often satisfying in the short run, but long-term, better opt for readability and code that you and your co-workers will understand. 我会留给您解决这种怪异的内部问题:-),但要记住:单线通常在短期内令人满意,但从长远来看,更好地选择了您和您的合作伙伴的可读性和代码工人会明白的。 And readability is a big part of the Pythonicity you allude to. 可读性是您所暗示的Pythonic的重要组成部分。

Also note that this is O(log_N) because of the sorting. 还请注意,由于排序,这是O(log_N) You can achieve the same by applying one of the O(N) duplicate removal techniques involving eg an OrderedDict to the output of chain and keep it O(N) , but that one line would get even longer. 您可以通过将O(N)重复删除技术中的一种O(N)例如涉及OrderedDict应用于chain的输出并将其保持为O(N) ,但是那一行会更长。

Update: 更新:

One of the O(N) ways to do it is DanD.'s suggestion which can be utilised in a single line using the comprehension trick to avoid assigning an intermediate result to a variable: O(N)实现方法是DanD。的建议,可以使用理解技巧在一行中使用该建议,以避免将中间结果分配给变量:

list(range(*[(x[0][0], x[-1][1]+1) for x in [max((list(g) for k, g in groupby(zip(l, l[1:]), key=f) if k), key=len)]][0]))
# [5, 6, 7, 8]

Prettier, however, it is not :D 更漂亮,但是,它不是:D

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM