简体   繁体   English

KMP字符串匹配算法:辅助数组输出

[英]KMP string matching algorithm:Auxillary array output

This is my Implementation of KMP string matching algorithm . 这是我对KMP字符串匹配算法的实现 When i check pi array ,it stores 0,1,2,3,4,5,6. 当我检查pi数组时,它存储0、1、2、3、4、5、6。 But according to algo books it should be 0,0,1,2,3,0,1. 但是根据算法书籍,它应该是0,0,1,2,3,0,1。 My code give correct result also.I don't understand why this is happening, or am I doing something wrong ? 我的代码也给出正确的结果。我不明白为什么会这样,还是我做错了什么? and if so ,please correct me. 如果是这样,请纠正我。

thanks. 谢谢。

#include<iostream>
#include<string>
#include<string.h>

using namespace std;

int* ComputePrefix(char P[])
{
    size_t m = strlen(P);
    int *pi = new int[m];
    pi[0] = 0;
    int k = 0;

    for(int q =0; q < m; q++)
    {
        if( k > 0 && P[k+1] != P[q])
            k = pi[k];

        if( P[k+1] == P[q])
            {
                pi[q] = k;
                k = k + 1;
            }
            pi[q]=k;
    }

    return (pi);
}

void KMP_Matcher(char T[], char P[])
{

    size_t n = strlen(T);
    size_t m = strlen(P);

    int *pi = new int[m];
    pi = ComputePrefix(P);

    cout<<endl;


    int q =0;
    for (int i = 0; i <= n; i++)
    {
        if( q > 0 && P[q] != T[i] )
        {
            q = pi[q - 1];
        }


        else if( P[q] == T[i])
        {


            if( q == m-1)
            {
                cout<<"Shift occurs at : "<< i-q <<endl;
                q = pi[q];
            }
            else q = q + 1;
        }

        else q++;
    }
}


int main()
{
    char T[] = "abababacaba";
    char P[] = "ababaca";

    KMP_Matcher(T,P);
    return 0;
}

Your jump table constructing function simply does not check the needle for prefixes. 您的跳转表构造函数根本不会检查指针的前缀。 We want to be able to look up, for each position in the needle, the length of the longest possible proper prefix of the needle leading up to (but not including) that position, other than the full prefix starting at needle[0] that just failed to match; 我们希望能够针对针中的每个位置查找导致(但不包括)该位置的针的尽可能长的适当前缀的长度,而不是从needle[0]开始的完整前缀只是不匹配; this is how far we have to backtrack in finding the next match. 这是寻找下一场比赛我们必须回溯的距离。 Hence each entry in the jump table (say, table[i] ) is exactly the length of the longest possible proper prefix of the needle which is also a prefix of the substring ending at needle[i - 1] . 因此,跳转表中的每个条目(例如, table[i] )恰好是针的最长可能适当前缀的长度,该长度也是以needle[i - 1]结尾的子串的前缀。

The first two entries in the jump table are -1 and 0, since a) a mismatch at the start of the pattern does not trigger backtracking (or, in other words, a prefix of zero length cannot have any proper prefixes or suffixes) and b) the empty string is considered to be of length 0. 跳转表中的前两个条目分别为-1和0,因为a)模式开头的不匹配不会触发回溯(换句话说,长度为零的前缀不能有任何适当的前缀或后缀),并且b)空字符串被认为长度为0。

For more details please look at wikipedia or an algorithms textbook. 有关更多详细信息,请参阅Wikipedia或算法教科书。

The code to accomplish the above is: 完成以上操作的代码是:

int *build_jump_table(const char * target)
{
    if(!target)
        return NULL;
    int *table = new int[strlen(target) + 1];
    if(!table)
        return NULL;
    table[0] = -1; /* unused by the matcher, just used here */

    for(int i = 0; target[i] != '\0'; i++) {
        table[i+1] = table[i] + 1;
        while(table[i+1] > 0 && target[i] != target[table[i+1] - 1]) {
            table[i + 1] = table[table[i + 1] - 1] + 1;
        }
    }
    return table;
}

which is quite verbose, and can be simplified a lot when you understand the concept behind the jump table. 这非常冗长,当您了解跳转表背后的概念时,可以进行很多简化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM