简体   繁体   English

使用 KNUTH-MORRIS-PRATT 获取在序列中找到的模式索引?

[英]Getting indices of patterns found in a sequence using KNUTH-MORRIS-PRATT?

I am trying to find patterns in a sequence of integers.我试图在整数序列中找到模式。 I have found the KNUTH-MORRIS-PRATT (KMP) in this link .我在此链接中找到了 KNUTH-MORRIS-PRATT (KMP)。

I've fed the function a 'pattern' to find in a 'text.'我已经为函数提供了一个“模式”以在“文本”中找到。 But the output of the KMP function is an object.但是 KMP 函数的输出是一个对象。 I need the indices for the instances of the pattern in the text.我需要文本中模式实例的索引。 I tried checking out the attributes of the object by typing dot and pressing tab but nothing is there.我尝试通过键入 dot 并按 Tab 来检查对象的属性,但没有任何内容。 How can I get the indices?我怎样才能得到指数?

Edit编辑

Code:代码:

> # Knuth-Morris-Pratt string matching
> # David Eppstein, UC Irvine, 1 Mar 2002
> 
> from __future__ import generators
> 
> def KnuthMorrisPratt(text, pattern):
> 
>     '''Yields all starting positions of copies of the pattern in the text. Calling conventions are similar to string.find, but its
> arguments can be lists or iterators, not just strings, it returns all
> matches, not just the first one, and it does not need the whole text
> in memory at once. Whenever it yields, it will have read the text
> exactly up to and including the match that caused the yield.'''
> 
>     # allow indexing into pattern and protect against change during yield
>     pattern = list(pattern)
> 
>     # build table of shift amounts
>     shifts = [1] * (len(pattern) + 1)
>     shift = 1
>     for pos in range(len(pattern)):
>         while shift <= pos and pattern[pos] != pattern[pos-shift]:
>             shift += shifts[pos-shift]
>         shifts[pos+1] = shift
> 
>     # do the actual search
>     startPos = 0
>     matchLen = 0
>     for c in text:
>         while matchLen == len(pattern) or \
>               matchLen >= 0 and pattern[matchLen] != c:
>             startPos += shifts[matchLen]
>             matchLen -= shifts[matchLen]
>         matchLen += 1
>         if matchLen == len(pattern):
>             yield startPos

Sample Text: [1, 2, 2, 3, 3, 2, 4, 5, 2, 2, 3, 2]
Sample Pattern: [2, 2, 3]

Sample output: [1, 8] 

You aren't returning anything from the function and you need to loop through the iterator to get the indices by using comprehension .您没有从函数返回任何内容,您需要循环遍历迭代器以使用comprehension获取索引。 Rewrite it this way:改写成这样:

from __future__ import generators

def KnuthMorrisPratt(text, pattern):

    pattern = list(pattern)

    # build table of shift amounts
    shifts = [1] * (len(pattern) + 1)
    shift = 1
    for pos in range(len(pattern)):
        while shift <= pos and pattern[pos] != pattern[pos-shift]:
            shift += shifts[pos-shift]
        shifts[pos+1] = shift

    # do the actual search
    startPos = 0
    matchLen = 0
    for c in text:        
        while matchLen == len(pattern) or matchLen >= 0 and pattern[matchLen] != c:
            startPos += shifts[matchLen]
            matchLen -= shifts[matchLen]
        matchLen += 1
        if matchLen == len(pattern):
            yield startPos

    return matchLen

t= [1, 2, 2, 3, 3, 2, 4, 5, 2, 2, 3, 2]
p= [2, 2, 3]
[k for k in KnuthMorrisPratt(t,p)] 

[1, 8]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM