KMP DFA前缀功能

Question

I was asked to learn about KMP DFA, and what I've found in my book is that implementation, but our lecturer calls something "the prefix function" all the time. 我被要求了解KMP DFA，我在书中发现的是实施，但我们的讲师一直称之为“前缀功能”。 I really can't understand which part is this function here, can someone explain it to me ? 我真的无法理解这个功能在哪里，有人能解释一下吗？ I'm sorry if that was asked somewhere, but I couldn't find it. 如果在某个地方被问过，我很抱歉，但我找不到它。

public class KMP {
private String pat;
private String t;
private int[][] fsm;

public static final int ALPHABET = 256;

public KMP(String pat) {
    this.pat = pat;
    char[] pattern = pat.toCharArray();

    int M = pattern.length;

    fsm = new int[ALPHABET][pattern.length];
    fsm[pattern[0]][0] = 1;

    for(int X = 0, j = 1; j < M; j++) {

        for(int c = 0; c < ALPHABET; c++) {
            fsm[c][j] = fsm[c][X];
        }
        fsm[pattern[j]][j] = j + 1;
        X = fsm[pattern[j]][X];
    }
    display(fsm);
}

public void search(String t) {
    char[] text = t.toCharArray();
    this.t = t;
    int N = text.length;
    int M = pat.length();

    int i, j;
    for(i = 0, j = 0; i < N; i++) {
        j = fsm[t.charAt(i)][j];
        if(j == M) {
            System.out.println("Found at " + (i - M + 1));
            j = 0;
        }
    }
}

Answer 1

The KMP algorithm does not construct a DFA. KMP算法不构造DFA。 What you have implemented is looks more like a DFA, which recognizes some string pattern . 您实现的内容看起来更像是DFA，它识别一些字符串pattern 。

The idea behind KMP algorithm is to construct the so called prefix function for the given pattern . KMP算法背后的想法是为给定pattern构造所谓的前缀函数。 And what is this function? 这个功能是什么？ It's definition is that for each position i of the string we are interested in the length of the longest suffix of pattern[1..i] , which is also a prefix of the pattern string (0-indexed). 它的定义是，对于字符串的每个位置i ，我们感兴趣的是pattern[1..i]的最长后缀的长度，它也是pattern字符串的前缀（0索引）。 This may sound confusing, but here is an example: 这可能听起来令人困惑，但这是一个例子：

The prefix function of pattern = "abacabacada" is pf[] = 0 0 1 0 1 2 3 4 5 0 1 . pattern = "abacabacada"的前缀函数是pf[] = 0 0 1 0 1 2 3 4 5 0 1 。 pf[8] is equal to 5, because the longest suffix of "bacabaca", that is also a prefix of "abacabacada" is "abaca", which has length 5. Analogically, pf[9] = 0 because there is no suffix of bacabacad which is also a prefix of abacabacada (the pattern). pf[8]等于5，因为“bacabaca”的最长后缀，也就是“abacabacada”的前缀是“abaca”，其长度为5.类似地， pf[9] = 0因为没有后缀bacabacad也是abacabacada （模式）的前缀。

I hope that this explanation makes the prefix function clearer. 我希望这个解释使前缀功能更清晰。 Some friends call the array, storing the prefix function fl , short for "fail link" because while doing the matching, we use the values in this array only when the characters from text and pattern mismatch. 有些朋友调用数组，存储前缀函数fl ，“失败链接”的缩写，因为在进行匹配时，我们只在text和pattern的字符不匹配时才使用此数组中的值。

Here is a clear implementation of the algorithm (in Java). 这是算法的清晰实现（在Java中）。

KMP DFA前缀功能

问题描述

1 个解决方案

解决方案1
2 2013-11-24 23:04:22

KMP DFA前缀功能

问题描述

1 个解决方案

解决方案1 2 2013-11-24 23:04:22

解决方案1
2 2013-11-24 23:04:22