简体   繁体   中英

Pythonic way for longest contiguous subsequence

I have a sorted list of integers in a list called "black" and I'm looking for an elegant way to get start "s" and end "e" of the longest contiguous subsequence (the original problem had black pixels in a wxh-bitmap and I look for the longest line in a given column x). My solution works but looks ugly:

# blacks is a list of integers generated from a bitmap this way:
# blacks= [y for y in range(h) if bits[y*w+x]==1]

longest=(0,0)
s=blacks[0]
e=s-1
for i in blacks:
    if e+1 == i:   # Contiguous?
        e=i
    else:
        if e-s > longest[1]-longest[0]:
            longest = (s,e)
        s=e=i
if e-s > longest[1]-longest[0]:
    longest = (s,e)
print longest 

I feel that this could be done in a smart one or two-liner

You could do the following, using itertools.groupby and itertools.chain :

from itertools import groupby, chain
l = [1, 2, 5, 6, 7, 8, 10, 11, 12]
f = lambda x: x[1] - x[0] == 1  # key function to identify proper neighbours

The following is still almost readable ;-) and gets you a decent intermediate step from which to proceed in a more sensible manner would probably be a valid option:

max((list(g) for k, g in groupby(zip(l, l[1:]), key=f) if k), key=len)
# [(5, 6), (6, 7), (7, 8)]

In order to extract the actaul desired sequence [5, 6, 7, 8] in one line, you have to use some more kung-fu:

sorted(set(chain(*max((list(g) for k, g in groupby(zip(l, l[1:]), key=f) if k), key=len))))
# [5, 6, 7, 8]

I shall leave it to you to work out the internals of this monstrosity :-) but keep in mind: a one-liner is often satisfying in the short run, but long-term, better opt for readability and code that you and your co-workers will understand. And readability is a big part of the Pythonicity you allude to.

Also note that this is O(log_N) because of the sorting. You can achieve the same by applying one of the O(N) duplicate removal techniques involving eg an OrderedDict to the output of chain and keep it O(N) , but that one line would get even longer.

Update:

One of the O(N) ways to do it is DanD.'s suggestion which can be utilised in a single line using the comprehension trick to avoid assigning an intermediate result to a variable:

list(range(*[(x[0][0], x[-1][1]+1) for x in [max((list(g) for k, g in groupby(zip(l, l[1:]), key=f) if k), key=len)]][0]))
# [5, 6, 7, 8]

Prettier, however, it is not :D

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM