简体   繁体   中英

KMP string matching algorithm:Auxillary array output

This is my Implementation of KMP string matching algorithm . When i check pi array ,it stores 0,1,2,3,4,5,6. But according to algo books it should be 0,0,1,2,3,0,1. My code give correct result also.I don't understand why this is happening, or am I doing something wrong ? and if so ,please correct me.

thanks.

#include<iostream>
#include<string>
#include<string.h>

using namespace std;

int* ComputePrefix(char P[])
{
    size_t m = strlen(P);
    int *pi = new int[m];
    pi[0] = 0;
    int k = 0;

    for(int q =0; q < m; q++)
    {
        if( k > 0 && P[k+1] != P[q])
            k = pi[k];

        if( P[k+1] == P[q])
            {
                pi[q] = k;
                k = k + 1;
            }
            pi[q]=k;
    }

    return (pi);
}

void KMP_Matcher(char T[], char P[])
{

    size_t n = strlen(T);
    size_t m = strlen(P);

    int *pi = new int[m];
    pi = ComputePrefix(P);

    cout<<endl;


    int q =0;
    for (int i = 0; i <= n; i++)
    {
        if( q > 0 && P[q] != T[i] )
        {
            q = pi[q - 1];
        }


        else if( P[q] == T[i])
        {


            if( q == m-1)
            {
                cout<<"Shift occurs at : "<< i-q <<endl;
                q = pi[q];
            }
            else q = q + 1;
        }

        else q++;
    }
}


int main()
{
    char T[] = "abababacaba";
    char P[] = "ababaca";

    KMP_Matcher(T,P);
    return 0;
}

Your jump table constructing function simply does not check the needle for prefixes. We want to be able to look up, for each position in the needle, the length of the longest possible proper prefix of the needle leading up to (but not including) that position, other than the full prefix starting at needle[0] that just failed to match; this is how far we have to backtrack in finding the next match. Hence each entry in the jump table (say, table[i] ) is exactly the length of the longest possible proper prefix of the needle which is also a prefix of the substring ending at needle[i - 1] .

The first two entries in the jump table are -1 and 0, since a) a mismatch at the start of the pattern does not trigger backtracking (or, in other words, a prefix of zero length cannot have any proper prefixes or suffixes) and b) the empty string is considered to be of length 0.

For more details please look at wikipedia or an algorithms textbook.

The code to accomplish the above is:

int *build_jump_table(const char * target)
{
    if(!target)
        return NULL;
    int *table = new int[strlen(target) + 1];
    if(!table)
        return NULL;
    table[0] = -1; /* unused by the matcher, just used here */

    for(int i = 0; target[i] != '\0'; i++) {
        table[i+1] = table[i] + 1;
        while(table[i+1] > 0 && target[i] != target[table[i+1] - 1]) {
            table[i + 1] = table[table[i + 1] - 1] + 1;
        }
    }
    return table;
}

which is quite verbose, and can be simplified a lot when you understand the concept behind the jump table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM