简体   繁体   English

将连续整数分组并容忍1的间隙

[英]Group consecutive integers and tolerate gaps of 1

In Python, given a list of sorted integers, I would to group them by consecutive values and tolerate gaps of 1. 在Python中,给定一个排序整数列表,我会按连续值对它们进行分组, 容忍1的间隙。

For instance, given a list my_list : 例如,给定一个列表my_list

In [66]: my_list
Out[66]: [0, 1, 2, 3, 5, 6, 10, 11, 15, 16, 18, 19, 20]

I would like the following output: 我想要以下输出:

[[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]]

Now, if I didn't have to tolerate gaps of 1, I could apply the neat solution explained here : 现在,如果我不必忍受1的差距,我可以应用这里解释的整洁解决方案:

import itertools
import operator
results = []
for k, g in itertools.groupby(enumerate(my_list), lambda (i,x):i-x):
        group = map(operator.itemgetter(1), g)
        results.append(group)

Is there a way to incorporate my extra requirement in the above solution? 有没有办法将我的额外要求纳入上述解决方案? If not, what's the best way to tackle the problem? 如果没有,解决问题的最佳方法是什么?

When in doubt you can always write your own generator: 如有疑问,您可以随时编写自己的发电机:

def group_runs(li,tolerance=2):
    out = []
    last = li[0]
    for x in li:
        if x-last > tolerance:
            yield out
            out = []
        out.append(x)
        last = x
    yield out

demo: 演示:

list(group_runs(my_list))
Out[48]: [[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]]

Numpy is a very useful tool, and not very difficult to learn. Numpy是一个非常有用的工具,并不是很难学。

This problem is solvable in O(n) with a single line of code (excluding imports, data, and converting to list - if you really need it): 这个问题可以在O(n) ,只需一行代码(不包括导入,数据和转换为列表 - 如果你真的需要它):

from numpy import array, diff, where, split
l= [0, 1, 2, 3, 5, 6, 10, 11, 15, 16, 18, 19, 20]
result= split(l, where(diff(l)>2)[0]+1 )
print map(list, result)

More importantly, the code is very fast if you need to process large lists, unlike a pure-python solution 更重要的是,如果您需要处理大型列表,代码非常快,与纯python解决方案不同

Remember, groupby in itself, is pretty lame. 请记住,groupby本身就很蹩脚。 The strength of itertools.groupby is the key. itertools.groupby的优势是关键。 For this particular problem, you need to create an appropriate key with memory (stateless key will not work here). 对于此特定问题,您需要使用内存创建适当的密钥(无状态密钥在此处不起作用)。

Implementation 履行

class Key(object):
    def __init__(self, diff):
        self.diff, self.flag, self.prev = diff, [0,1], None
    def __call__(self, elem):
        if self.prev and abs(self.prev - elem) > self.diff:
            self.flag = self.flag[::-1]
        self.prev= elem
        return self.flag[0]

Object 宾语

[list(g) for k, g in groupby(my_list, key = Key(2))]
[[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]]

How it Works 这个怎么运作

Every time, a new sub-list needs to be created ( curr - prev > threshold ), you toggle a flag. 每次都需要创建一个新的子列表( curr - prev > threshold ),你切换一个标志。 There are different ways to toggle a flag 有不同的方法来切换标志

  • flag = 1; flag *= -1
  • flag = [0, 1 ]; flag = flag[::-1]
  • flag = 0; flag = 0 if flag else 1

Choose what ever your heart contends 选择你心中所想的东西

So this generates an accompanying key along with your list 因此,这会生成一个随附的密钥以及您的列表

[0, 1, 2, 3, 5, 6, 10, 11, 15, 16, 18, 19, 20]
[0, 0, 0, 0, 0, 0, 1,  1,  0,  0,  0,  0 , 0]
             <------>  <------>
          Toggle flag  Toggle flag
          0 -> 1, as   1 -> 0, as
          10 - 6 > 2   15 - 11 > 2

Now as itertools.groupby , groups consecutive elements with same key, all elements with keys having consecutive 0 s or 1 s are grouped together 现在作为itertools.groupby ,将具有相同键的连续元素分组,所有具有连续0秒或1秒的键的元素被组合在一起

[0, 1, 2, 3, 5, 6, 10, 11, 15, 16, 18, 19, 20]
[0, 0, 0, 0, 0, 0, 1,  1,  0,  0,  0,  0 , 0]

[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]
[0, 0, 0, 0, 0, 0], [1,  1],  [0,  0,  0,  0 , 0]

And your final result would be 而你的最终结果将是

[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]

An O(nlogn) solution (assuming the input list isn't sorted) is to first the sort the list you're given, then iterate through each value, creating a new group whenever the difference between the current value and the previous value is at least 3. O(nlogn)解决方案(假设输入列表未排序)首先对您给出的列表进行排序,然后迭代每个值,每当当前值与前一个值之间的差异为时创建一个新组至少3个。

Demo 演示

>>> my_list = [0, 1, 2, 3, 5, 6, 10, 11, 15, 16, 18, 19, 20]
>>> my_list.sort() # if we can't assume the list is sorted beforehand
>>> groups = [[my_list[0]]] # initialize with the first value in the list
>>> for i, val in enumerate(my_list[1:]):
...     if val - groups[-1][-1] > 2:
...         groups.append( [val] ) # create a new group
...     else:
...         groups[-1].append( val ) # append to the most recent group
... 
>>> groups
[[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]]

I generally use zip when I want to deal with consecutive elements, and you can use islice you want to avoid building the list slice: 当我想处理连续元素时,我通常使用zip ,你可以使用islice来避免构建列表切片:

from itertools import islice

def group(lst, tol=1):
    """Group vals in sorted iterable lst, allow tol between consecutive vals."""
    output = [[]]
    for current, next_ in zip(lst, islice(lst, 1, None)):
        output[-1].append(current)
        if next_ > current + tol + 1:
            output.append([])
    output[-1].append(lst[-1])
    return output

Note that in Python 2.x, you need to use itertools.izip to avoid building the list of 2-tuples (current, next_) . 请注意,在Python 2.x中,您需要使用itertools.izip来避免构建2元组列表(current, next_)

Here's what I came up with. 这就是我想出来的。 There's a bit of verbose initialization but it gets the job done. 有一些冗长的初始化,但它完成了工作。 =) =)

output = []
prev = my_list[0]
temp_list = [my_list[0]]

for num in my_list[1:]:
    if num-2 > prev:
        output += [temp_list]
        temp_list = [num]
    else:
        temp_list.append(num)
    prev = num
output.append(temp_list)

print output

# [[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM