KMP DFA restart state

Question

I am referring "Algorithms fourth Edition by Sedgewick & Wyane" String matching Chapter 5 .

The given algorithm is KMP substring search in which it build a DFA from pattern state . I understand the algorithm for building the DFA , code is as follows :

public KMP(String pat) {
        this.R = 256;
        this.pat = pat;

        // build DFA from pattern
        int m = pat.length();
        dfa = new int[R][m]; 
        dfa[pat.charAt(0)][0] = 1; 
        for (int x = 0, j = 1; j < m; j++) {
            for (int c = 0; c < R; c++) 
                dfa[c][j] = dfa[c][x];     // Copy mismatch cases. 
            dfa[pat.charAt(j)][j] = j+1;   // Set match case. 
            x = dfa[pat.charAt(j)][x];     // Update restart state. 
        } 
    }

I am not able to get the following line : x = dfa[pat.charAt(j)][x]; // Update restart state. x = dfa[pat.charAt(j)][x]; // Update restart state.

I understand that this value is achieved by feeding the pat[1..j-1] in partial build DFA but not able to get that the code,how it is achieving this.

I also understand that x is the length of longest prefix of pattern that the also suffix.

I have seen many other related question but those are related to understand the algorithm itself.

I need to understand that how x = dfa[pat.charAt(j)][x]; // Update restart state. x = dfa[pat.charAt(j)][x]; // Update restart state. simulating the restart state .

Answer 1

If we look carefully, X is initialized to state 0, and J is to state 1

Now, we just keep moving both forward based on next character visited, and since X is behind J he already knows which state is next, by default ALL ARE POINTING BACK TO 0 so that line will always maintain the prefix, if any otherwise restart at 0

dfa[c][j] = dfa[c][x]; // Copy mismatch cases. This line is just creating failure or back pointers

x = dfa[pat.charAt(j)][x]; // Update restart state. And this line is moving the prefix ahead, to stay in sync with J, so it always point to a place where prefix == suffix

perhaps this would help further https://labuladong.gitbook.io/algo-en/i.-dynamic-programming/kmpcharactermatchingalgorithmindynamicprogramming

Answer 2

First, you should know the meaning of X:

before we update it, it means the state(how many characters are successfully matched) we'll go to from current state(j characters matched)
after we update it, it means the state we'll go to from next state(j + 1 characters matched)

Then

The update of X is caused by the successful matching of the txt[i] and pat[j], attention, what state they need to be match successfully (state determines the x , the character need here determines the pat.charAt(j) of the x = dfa[pat.charAt(j)][x]) , in the state that the first match fails, the state becomce the origin X , because we need to match the txt[i + 1] instead of txt[i] in the next loop in search()

KMP DFA restart state

Question

2 answers

solution1
0 2021-06-25 05:42:13

solution2
0 2021-10-17 10:36:33

KMP DFA restart state

Question

2 answers

solution1 0 2021-06-25 05:42:13

solution2 0 2021-10-17 10:36:33

solution1
0 2021-06-25 05:42:13

solution2
0 2021-10-17 10:36:33