[英]Is there a Python builtin for determining if an iterable contained a certain sequence?
For example, something like: 例如,类似于:
>>> [1, 2, 3].contains_sequence([1, 2])
True
>>> [1, 2, 3].contains_sequence([4])
False
I know that the in
operator can do this for strings: 我知道in
运算符可以为字符串执行此操作:
>>> "12" in "123"
True
But I'm looking for something that operates on iterables. 但我正在寻找可以在迭代上运行的东西。
Referenced from https://stackoverflow.com/a/6822773/24718 modified to use a list. 参考https://stackoverflow.com/a/6822773/24718修改后使用列表。
from itertools import islice
def window(seq, n=2):
"""
Returns a sliding window (of width n) over data from the iterable
s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
"""
it = iter(seq)
result = list(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + [elem]
yield result
def contains_sequence(all_values, seq):
return any(seq == current_seq for current_seq in window(all_values, len(seq)))
test_iterable = [1,2,3]
search_sequence = [1,2]
result = contains_sequence(test_iterable, search_sequence)
Is there a Python builtin? 是否内置了Python? No. You can accomplish this task in various ways. 不可以。您可以通过各种方式完成此任务。 Here is a recipe that does it, and also gives you the position of the subsequence in the containing sequence: 这是一个执行它的配方 ,并且还为您提供包含序列中子序列的位置:
def _search(forward, source, target, start=0, end=None):
"""Naive search for target in source."""
m = len(source)
n = len(target)
if end is None:
end = m
else:
end = min(end, m)
if n == 0 or (end-start) < n:
# target is empty, or longer than source, so obviously can't be found.
return None
if forward:
x = range(start, end-n+1)
else:
x = range(end-n, start-1, -1)
for i in x:
if source[i:i+n] == target:
return i
return None
As far as I know, there's no way to do this. 据我所知,没有办法做到这一点。 You can roll your own function pretty easily, but I doubt that will be terribly efficient. 你可以很容易地推出自己的功能,但我怀疑它会非常有效。
>>> def contains_seq(seq,subseq):
... #try: junk=seq[:]
... #except: seq=tuple(seq)
... #try: junk=subseq[:]
... #except: subseq=tuple(subseq)
... ll=len(subseq)
... for i in range(len(seq)-ll): #on python2, use xrange.
... if(seq[i:i+ll] == subseq):
... return True
... return False
...
>>> contains_seq(range(10),range(3)) #True
>>> contains_seq(range(10),[2,3,6]) #False
Note that this solution does not work with generator type objects (it only works on objects that you can slice). 请注意,此解决方案不适用于生成器类型对象(它仅适用于您可以切片的对象)。 You could check seq
to see if it is sliceable before proceeding and cast to a tuple
if it isn't sliceable -- But then you get rid of the benefits of slicing. 您可以检查seq
以查看它是否可切片,然后继续并转换为tuple
如果它不可切片) - 但是您可以摆脱切片的好处。 You could re-write it to check one element at a time instead of using slicing, but I have a feeling performance would suffer even more. 您可以重新编写它以一次检查一个元素而不是使用切片,但我感觉性能会受到更多影响。
As others have said, there's no builtin for this. 正如其他人所说的那样,没有内置因素。 Here's an implementation that is potentially more efficient than the other answers I've seen -- in particular, it scans through the iterable, just keeping track of what prefix sizes of the target sequence it's seen. 这是一个可能比我见过的其他答案更有效的实现 - 特别是,它扫描迭代,只是跟踪它所看到的目标序列的前缀大小。 But that increased efficiency comes at some expense in increased verbosity over some of the other approaches that have been suggested. 但是,与其他一些已经提出的方法相比,提高效率会带来一些代价。
def contains_seq(iterable, seq):
"""
Returns true if the iterable contains the given sequence.
"""
# The following clause is optional -- leave it if you want to allow `seq` to
# be an arbitrary iterable; or remove it if `seq` will always be list-like.
if not isinstance(seq, collections.Sequence):
seq = tuple(seq)
if len(seq)==0: return True # corner case
partial_matches = []
for elt in iterable:
# Try extending each of the partial matches by adding the
# next element, if it matches.
partial_matches = [m+1 for m in partial_matches if elt == seq[m]]
# Check if we should start a new partial match
if elt==seq[0]:
partial_matches.append(1)
# Check if we have a complete match (partial_matches will always
# be sorted from highest to lowest, since older partial matches
# come before newer ones).
if partial_matches and partial_matches[0]==len(seq):
return True
# No match found.
return False
If preserving of order is not necessary, you can use sets (builtin): 如果不需要保留顺序,则可以使用集合(内置):
>>> set([1,2]).issubset([1,2,3])
True
>>> set([4]).issubset([1,2,3])
False
Otherwise: 除此以外:
def is_subsequence(sub, iterable):
sub_pos, sub_len = 0, len(sub)
for i in iterable:
if i == sub[sub_pos]:
sub_pos += 1
if sub_pos >= sub_len:
return True
else:
sub_pos = 0
return False
>>> is_subsequence([1,2], [0,1,2,3,4])
True
>>> is_subsequence([2,1], [0,1,2,3,4]) # order preserved
False
>>> is_subsequence([1,2,4], [0,1,2,3,4])
False
This one works with any iterator. 这适用于任何迭代器。
deque appears to be useful here: deque似乎在这里很有用:
from collections import deque
def contains(it, seq):
seq = deque(seq)
deq = deque(maxlen=len(seq))
for p in it:
deq.append(p)
if deq == seq:
return True
return False
Note that this accepts arbitrary iterables for both arguments (no slicing required). 请注意,这接受两个参数的任意迭代(不需要切片)。
As there's no builtin, I made a nice version: 由于没有内置,我做了一个很好的版本:
import itertools as it
def contains(seq, sub):
seq = iter(seq)
o = object()
return any(all(i==j for i,j in zip(sub, it.chain((n,),seq,
(o for i in it.count())))) for n in seq)
This do not require any extra lists (if you use it.izip
or Py3k). 这不需要任何额外的列表 (如果你使用它it.izip
或Py3k)。
>>> contains([1,2,3], [1,2])
True
>>> contains([1,2,3], [1,2,3])
True
>>> contains([1,2,3], [2,3])
True
>>> contains([1,2,3], [2,3,4])
False
Extra points if you have no trouble reading it. 如果您在阅读时没有问题,请加分。 (It does the job, but the implementation is not to be taked too seriously). (它完成了这项工作,但实施不应过于严肃)。 ;) ;)
You could convert it into a string and then do matching on it 您可以将其转换为字符串,然后对其进行匹配
full_list = " ".join([str(x) for x in [1, 2, 3]])
seq = " ".join([str(x) for x in [1, 2]])
seq in full_list
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.