简体   繁体   English

是否有内置的Python用于确定迭代是否包含某个序列?

[英]Is there a Python builtin for determining if an iterable contained a certain sequence?

For example, something like: 例如,类似于:

>>> [1, 2, 3].contains_sequence([1, 2])
True
>>> [1, 2, 3].contains_sequence([4])
False

I know that the in operator can do this for strings: 我知道in运算符可以为字符串执行此操作:

>>> "12" in "123"
True

But I'm looking for something that operates on iterables. 但我正在寻找可以在迭代上运行的东西。

Referenced from https://stackoverflow.com/a/6822773/24718 modified to use a list. 参考https://stackoverflow.com/a/6822773/24718修改后使用列表。

from itertools import islice

def window(seq, n=2):
    """
    Returns a sliding window (of width n) over data from the iterable
    s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   
    """
    it = iter(seq)
    result = list(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + [elem]
        yield result

def contains_sequence(all_values, seq):
    return any(seq == current_seq for current_seq in window(all_values, len(seq)))            

test_iterable = [1,2,3]
search_sequence = [1,2]

result = contains_sequence(test_iterable, search_sequence)

Is there a Python builtin? 是否内置了Python? No. You can accomplish this task in various ways. 不可以。您可以通过各种方式完成此任务。 Here is a recipe that does it, and also gives you the position of the subsequence in the containing sequence: 这是一个执行它的配方 ,并且还为您提供包含序列中子序列的位置:

def _search(forward, source, target, start=0, end=None):
    """Naive search for target in source."""
    m = len(source)
    n = len(target)
    if end is None:
        end = m
    else:
        end = min(end, m)
    if n == 0 or (end-start) < n:
        # target is empty, or longer than source, so obviously can't be found.
        return None
    if forward:
        x = range(start, end-n+1)
    else:
        x = range(end-n, start-1, -1)
    for i in x:
        if source[i:i+n] == target:
            return i
    return None

As far as I know, there's no way to do this. 据我所知,没有办法做到这一点。 You can roll your own function pretty easily, but I doubt that will be terribly efficient. 你可以很容易地推出自己的功能,但我怀疑它会非常有效。

>>> def contains_seq(seq,subseq):
...     #try: junk=seq[:]
...     #except: seq=tuple(seq)
...     #try: junk=subseq[:]
...     #except: subseq=tuple(subseq)
...     ll=len(subseq)
...     for i in range(len(seq)-ll):  #on python2, use xrange.
...         if(seq[i:i+ll] == subseq):
...             return True
...     return False
...
>>> contains_seq(range(10),range(3)) #True
>>> contains_seq(range(10),[2,3,6]) #False

Note that this solution does not work with generator type objects (it only works on objects that you can slice). 请注意,此解决方案不适用于生成器类型对象(它仅适用于您可以切片的对象)。 You could check seq to see if it is sliceable before proceeding and cast to a tuple if it isn't sliceable -- But then you get rid of the benefits of slicing. 您可以检查seq以查看它是否可切片,然后继续并转换为tuple如果它不可切片) - 但是您可以摆脱切片的好处。 You could re-write it to check one element at a time instead of using slicing, but I have a feeling performance would suffer even more. 您可以重新编写它以一次检查一个元素而不是使用切片,但我感觉性能会受到更多影响。

As others have said, there's no builtin for this. 正如其他人所说的那样,没有内置因素。 Here's an implementation that is potentially more efficient than the other answers I've seen -- in particular, it scans through the iterable, just keeping track of what prefix sizes of the target sequence it's seen. 这是一个可能比我见过的其他答案更有效的实现 - 特别是,它扫描迭代,只是跟踪它所看到的目标序列的前缀大小。 But that increased efficiency comes at some expense in increased verbosity over some of the other approaches that have been suggested. 但是,与其他一些已经提出的方法相比,提高效率会带来一些代价。

def contains_seq(iterable, seq):
    """
    Returns true if the iterable contains the given sequence.
    """
    # The following clause is optional -- leave it if you want to allow `seq` to
    # be an arbitrary iterable; or remove it if `seq` will always be list-like.
    if not isinstance(seq, collections.Sequence):
        seq = tuple(seq)

    if len(seq)==0: return True # corner case

    partial_matches = []
    for elt in iterable:
        # Try extending each of the partial matches by adding the
        # next element, if it matches.
        partial_matches = [m+1 for m in partial_matches if elt == seq[m]]
        # Check if we should start a new partial match
        if elt==seq[0]:
            partial_matches.append(1)
        # Check if we have a complete match (partial_matches will always
        # be sorted from highest to lowest, since older partial matches 
        # come before newer ones).
        if partial_matches and partial_matches[0]==len(seq):
            return True
    # No match found.
    return False

If preserving of order is not necessary, you can use sets (builtin): 如果不需要保留顺序,则可以使用集合(内置):

>>> set([1,2]).issubset([1,2,3])
True
>>> set([4]).issubset([1,2,3])
False

Otherwise: 除此以外:

def is_subsequence(sub, iterable):
    sub_pos, sub_len = 0, len(sub)
    for i in iterable:
        if i == sub[sub_pos]:
            sub_pos += 1
            if sub_pos >= sub_len:
                return True
        else:
            sub_pos = 0
    return False

>>> is_subsequence([1,2], [0,1,2,3,4])
True
>>> is_subsequence([2,1], [0,1,2,3,4]) # order preserved
False
>>> is_subsequence([1,2,4], [0,1,2,3,4])
False

This one works with any iterator. 这适用于任何迭代器。

deque appears to be useful here: deque似乎在这里很有用:

from collections import deque

def contains(it, seq):
    seq = deque(seq)
    deq = deque(maxlen=len(seq))
    for p in it:
        deq.append(p)
        if deq == seq:
            return True
    return False

Note that this accepts arbitrary iterables for both arguments (no slicing required). 请注意,这接受两个参数的任意迭代(不需要切片)。

As there's no builtin, I made a nice version: 由于没有内置,我做了一个很好的版本:

import itertools as it

def contains(seq, sub):
    seq = iter(seq)
    o = object()
    return any(all(i==j for i,j in zip(sub, it.chain((n,),seq, 
                                      (o for i in it.count())))) for n in seq)

This do not require any extra lists (if you use it.izip or Py3k). 不需要任何额外的列表 (如果你使用它it.izip或Py3k)。

>>> contains([1,2,3], [1,2])
True
>>> contains([1,2,3], [1,2,3])
True
>>> contains([1,2,3], [2,3])
True
>>> contains([1,2,3], [2,3,4])
False

Extra points if you have no trouble reading it. 如果您在阅读时没有问题,请加分。 (It does the job, but the implementation is not to be taked too seriously). (它完成了这项工作,但实施不应过于严肃)。 ;) ;)

You could convert it into a string and then do matching on it 您可以将其转换为字符串,然后对其进行匹配

full_list = " ".join([str(x) for x in [1, 2, 3]])
seq = " ".join([str(x) for x in [1, 2]])
seq in full_list

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 确定可迭代对象是python中的字符串还是列表 - Determining whether an iterable is a string or list in python python:TypeError:类型&#39;builtin_function_or_method&#39;的参数不可迭代 - python : TypeError: argument of type 'builtin_function_or_method' is not iterable Python&#39;builtin_function_or_method&#39;对象不是可迭代错误 - Python 'builtin_function_or_method' object is not iterable error python搜索中的&#39;builtin_function_or_method&#39;对象不可迭代&#39;错误? - 'builtin_function_or_method' object is not iterable' error in python search? Python:TypeError:类型为&#39;builtin_function_or_method&#39;的参数不可迭代 - Python: TypeError: argument of type 'builtin_function_or_method' is not iterable python “类型错误:‘builtin_function_or_method’ object 不可迭代” - python "TypeError: 'builtin_function_or_method' object is not iterable" 确定哪些整数在 python 列表中顺序不正确 - determining which integers are out of sequence in a python list 内置函数或方法对象不可迭代 - builtin function or method object is not iterable 当我们不关心 output 时,Python 内置习语将 function 应用于可迭代的每个元素 - Python builtin idiom to apply a function to each element of an iterable when we don't care about the output Python-TypeError:类型&#39;builtin_function_or_method&#39;的参数不可迭代-使用函数和raw_input - Python - TypeError: argument of type 'builtin_function_or_method' is not iterable - Using Functions & raw_input
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM