简体   繁体   English

Python:找出整数列表是否一致

[英]Python: find out whether a list of integers is coherent

I am trying to find out whether a list of integers is coherent or 'at one stretch', meaning that the difference between two neighboring elements must be exactly one and that the numbers must be increasing monotonically. 我试图找出一个整数列表是连贯的还是“在一个范围内”,这意味着两个相邻元素之间的差异必须是一个,并且数字必须单调递增。 I found a neat approach where we can group by the number in the list minus the position of the element in the list -- this difference changes when the numbers are not coherent. 找到了一个简洁的方法,我们可以按列表中的数字减去列表中元素的位置进行分组 - 当数字不连贯时,这种差异会发生变化。 Obviously, there should be exactly one group when the sequence does not contain gaps or repetitions. 显然,当序列不包含间隙或重复时,应该只有一个组。

Test: 测试:

>>> l1 = [1, 2, 3, 4, 5, 6]
>>> l2 = [1, 2, 3, 4, 5, 7]
>>> l3 = [1, 2, 3, 4, 5, 5]
>>> l4 = [1, 2, 3, 4, 5, 4]
>>> l5 = [6, 5, 4, 3, 2, 1]
>>> def is_coherent(seq):
...     return len(list(g for _, g in itertools.groupby(enumerate(seq), lambda (i,e): i-e))) == 1
... 
>>> is_coherent(l1)
True
>>> is_coherent(l2)
False
>>> is_coherent(l3)
False
>>> is_coherent(l4)
False
>>> is_coherent(l5)
False

It works well, but I personally find that this solution is a bit too convoluted in view of the simplicity of the problem. 它运作良好,但我个人发现,鉴于问题的简单性,这个解决方案有点过于复杂。 Can you come up with a clearer way to achieve the same without significantly increasing the code length? 你能想出一个更清晰的方法来实现同样的目标而不会显着增加代码长度吗?

Edit: summary of answers 编辑:答案摘要

From the answers given below, the solution 从下面给出的答案,解决方案

def is_coherent(seq):
    return seq == range(seq[0], seq[-1]+1)

clearly wins. 明显胜利。 For small lists (10^3 elements), it is on the order of 10 times faster than the groupby approach and (on my machine) still four times faster than the next best approach (using izip_longest ). 对于小列表(10 ^ 3个元素),它比groupby方法快10倍,(在我的机器上)仍然比下一个最佳方法(使用izip_longest )快izip_longest It has the worst scaling behavior, but even for a large list with 10^8 elements it is still two times faster than the next best approach, which again is the izip_longest -based solution. 它具有最差的缩放行为,但即使对于具有10 ^ 8个元素的大型列表,它仍然比下一个最佳方法快两倍,这也是基于izip_longest的解决方案。

Relevant timing information obtained with timeit : 通过timeit获得的相关时间信息:

Testing is_coherent_groupby...
   small/large/larger/verylarge duration: 8.27 s, 20.23 s, 20.22 s, 20.76 s
   largest/smallest = 2.51
Testing is_coherent_npdiff...
   small/large/larger/verylarge duration: 7.05 s, 15.81 s, 16.16 s, 15.94 s
   largest/smallest = 2.26
Testing is_coherent_zip...
   small/large/larger/verylarge duration: 5.74 s, 20.54 s, 21.69 s, 24.62 s
   largest/smallest = 4.28
Testing is_coherent_izip_longest...
   small/large/larger/verylarge duration: 4.20 s, 10.81 s, 10.76 s, 10.81 s
   largest/smallest = 2.58
Testing is_coherent_all_xrange...
   small/large/larger/verylarge duration: 6.52 s, 17.06 s, 17.44 s, 17.30 s
   largest/smallest = 2.65
Testing is_coherent_range...
   small/large/larger/verylarge duration: 0.96 s, 4.14 s, 4.48 s, 4.48 s
   largest/smallest = 4.66

Testing code: 测试代码:

import itertools
import numpy as np
import timeit


setup = """
import numpy as np
def is_coherent_groupby(seq):
    return len(list(g for _, g in itertools.groupby(enumerate(seq), lambda (i,e): i-e))) == 1

def is_coherent_npdiff(x):
    return all(np.diff(x) == 1)

def is_coherent_zip(seq):
    return all(x==y+1 for x, y in zip(seq[1:], seq))

def is_coherent_izip_longest(l):
    return all(a==b for a, b in itertools.izip_longest(l, xrange(l[0], l[-1]+1)))

def is_coherent_all_xrange(l):
    return all(l[i] + 1 == l[i+1] for i in xrange(len(l)-1))

def is_coherent_range(seq):
    return seq == range(seq[0], seq[-1]+1)


small_list = range(10**3)
large_list = range(10**6)
larger_list = range(10**7)
very_large_list = range(10**8)
"""


fs = [
    'is_coherent_groupby',
    'is_coherent_npdiff',
    'is_coherent_zip',
    'is_coherent_izip_longest',
    'is_coherent_all_xrange',
    'is_coherent_range'
    ]


for n in fs:
    print "Testing %s..." % n
    t1 = timeit.timeit(
        '%s(small_list)' % n, 
        setup,
        number=40000
        )      
    t2 = timeit.timeit(
        '%s(large_list)' % n, 
        setup,
        number=100
        )     
    t3 = timeit.timeit(
        '%s(larger_list)' % n, 
        setup,
        number=10
        )
    t4 =  timeit.timeit(
        '%s(very_large_list)' % n, 
        setup,
        number=1
        )
    print "   small/large/larger/verylarge duration: %.2f s, %.2f s, %.2f s, %.2f s" % (t1, t2, t3, t4)
    print "   largest/smallest = %.2f" % (t4/t1)

Test machine: 试验机:

  • Linux 3.2.0 (Ubuntu 12.04) Linux 3.2.0(Ubuntu 12.04)
  • Python 2.7.3 (gcc 4.1.2) Python 2.7.3(gcc 4.1.2)
  • numpy 1.6.2 built with Intel compiler 用英特尔编译器构建的numpy 1.6.2
  • CPU: E5-2650 @ 2.00GHz CPU:E5-2650 @ 2.00GHz
  • 24 GB of memory 24 GB的内存

how bout 怎么样

sorted_list = sorted(my_list)
return sorted_list == range(sorted_list[0],sorted_list[-1]+1)

or if its only coherent if it is already sorted 或者如果它已经排序则它只是连贯的

return my_list == range(my_list[0],my_list[-1]+1)

if you are using python 3 you will need list(range(...)) 如果您使用的是python 3,则需要list(range(...))

Unless I'm overlooking something in your examples, this simpler solution is actually shorter. 除非我在示例中忽略了某些内容,否则这个更简单的解决方案实际上更短。

>>> l1 = [1, 2, 3, 4, 5, 6]
>>> l2 = [1, 2, 3, 4, 5, 7]
>>> l3 = [1, 2, 3, 4, 5, 5]
>>> l4 = [1, 2, 3, 4, 5, 4]
>>> l5 = [6, 5, 4, 3, 2, 1]
>>> 
>>> def is_coherent(seq):
...     return seq == range(seq[0], seq[0]+len(seq), 1)
... 
>>> is_coherent(l1)
True
>>> is_coherent(l2)
False
>>> is_coherent(l3)
False
>>> is_coherent(l4)
False
>>> is_coherent(l5)
False
>>> 

The results of some basic performance tests seem to indicate that this method is significantly quicker (I've added your example as is_coherent2 ): 一些基本性能测试的结果似乎表明这种方法明显更快(我将你的例子添加为is_coherent2 ):

Carl > python -m timeit -s 'from t import is_coherent, l1' 'is_coherent(l1)'
1000000 loops, best of 3: 0.782 usec per loop
Carl > python -m timeit -s 'from t import is_coherent, l3' 'is_coherent(l3)'
1000000 loops, best of 3: 0.796 usec per loop
Carl > python -m timeit -s 'from t import is_coherent2, l1' 'is_coherent2(l1)'
100000 loops, best of 3: 4.54 usec per loop
Carl > python -m timeit -s 'from t import is_coherent2, l3' 'is_coherent2(l3)'
100000 loops, best of 3: 4.93 usec per loop

If you're looking for a numpy solution: 如果您正在寻找一个numpy解决方案:

import numpy as np

def is_coherent(x):
    return all(np.diff(x) == 1)

is_coherent(np.array([1,2,3,4,5]))
Out[39]: True

is_coherent(np.array([1,2,3,4,8]))
Out[40]: False
def is_coherent(seq):
    return all(x==y+1 for x, y in zip(seq[1:], seq))

This short circuits and does not create an extra list making it useful for testing very large lists. 这种短路并没有创建额外的列表,使其可用于测试非常大的列表。

def is_coherent(l):
    return all(a==b for a, b in izip_longest(l, xrange(l[0], l[-1]+1)))

Or 要么

def is_coherent(l):
    return all(l[i] + 1 == l[i+1] for i in xrange(len(l)-1))

i dont know python but i know its functional so heres a small loop function that will do it if you change the syntax for correct python. 我不知道python,但我知道它的功能所以继承人一个小循环函数,如果你改变正确的python语法将会这样做。

PSEUDO CODE PSEUDO CODE

def is_coherent(seq):
     for x in xrange(1, len(seq)-1):
        if (seq[x+1]-seq[x] != 1)  return false;   
     return true

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM