简体   繁体   中英

KMP DFA restart state

I am referring "Algorithms fourth Edition by Sedgewick & Wyane" String matching Chapter 5 .

The given algorithm is KMP substring search in which it build a DFA from pattern state . I understand the algorithm for building the DFA , code is as follows :

public KMP(String pat) {
        this.R = 256;
        this.pat = pat;

        // build DFA from pattern
        int m = pat.length();
        dfa = new int[R][m]; 
        dfa[pat.charAt(0)][0] = 1; 
        for (int x = 0, j = 1; j < m; j++) {
            for (int c = 0; c < R; c++) 
                dfa[c][j] = dfa[c][x];     // Copy mismatch cases. 
            dfa[pat.charAt(j)][j] = j+1;   // Set match case. 
            x = dfa[pat.charAt(j)][x];     // Update restart state. 
        } 
    } 

I am not able to get the following line : x = dfa[pat.charAt(j)][x]; // Update restart state. x = dfa[pat.charAt(j)][x]; // Update restart state.

I understand that this value is achieved by feeding the pat[1..j-1] in partial build DFA but not able to get that the code,how it is achieving this.

I also understand that x is the length of longest prefix of pattern that the also suffix.

I have seen many other related question but those are related to understand the algorithm itself.

I need to understand that how x = dfa[pat.charAt(j)][x]; // Update restart state. x = dfa[pat.charAt(j)][x]; // Update restart state. simulating the restart state .

If we look carefully, X is initialized to state 0, and J is to state 1

Now, we just keep moving both forward based on next character visited, and since X is behind J he already knows which state is next, by default ALL ARE POINTING BACK TO 0 so that line will always maintain the prefix, if any otherwise restart at 0

dfa[c][j] = dfa[c][x]; // Copy mismatch cases. This line is just creating failure or back pointers

x = dfa[pat.charAt(j)][x]; // Update restart state. And this line is moving the prefix ahead, to stay in sync with J, so it always point to a place where prefix == suffix

perhaps this would help further https://labuladong.gitbook.io/algo-en/i.-dynamic-programming/kmpcharactermatchingalgorithmindynamicprogramming

First, you should know the meaning of X:

  • before we update it, it means the state(how many characters are successfully matched) we'll go to from current state(j characters matched)
  • after we update it, it means the state we'll go to from next state(j + 1 characters matched)

Then

The update of X is caused by the successful matching of the txt[i] and pat[j], attention, what state they need to be match successfully (state determines the x , the character need here determines the pat.charAt(j) of the x = dfa[pat.charAt(j)][x]) , in the state that the first match fails, the state becomce the origin X , because we need to match the txt[i + 1] instead of txt[i] in the next loop in search()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM