[英]KMP DFA prefix function
I was asked to learn about KMP DFA, and what I've found in my book is that implementation, but our lecturer calls something "the prefix function" all the time. 我被要求了解KMP DFA,我在书中发现的是实施,但我们的讲师一直称之为“前缀功能”。 I really can't understand which part is this function here, can someone explain it to me ?
我真的无法理解这个功能在哪里,有人能解释一下吗? I'm sorry if that was asked somewhere, but I couldn't find it.
如果在某个地方被问过,我很抱歉,但我找不到它。
public class KMP {
private String pat;
private String t;
private int[][] fsm;
public static final int ALPHABET = 256;
public KMP(String pat) {
this.pat = pat;
char[] pattern = pat.toCharArray();
int M = pattern.length;
fsm = new int[ALPHABET][pattern.length];
fsm[pattern[0]][0] = 1;
for(int X = 0, j = 1; j < M; j++) {
for(int c = 0; c < ALPHABET; c++) {
fsm[c][j] = fsm[c][X];
}
fsm[pattern[j]][j] = j + 1;
X = fsm[pattern[j]][X];
}
display(fsm);
}
public void search(String t) {
char[] text = t.toCharArray();
this.t = t;
int N = text.length;
int M = pat.length();
int i, j;
for(i = 0, j = 0; i < N; i++) {
j = fsm[t.charAt(i)][j];
if(j == M) {
System.out.println("Found at " + (i - M + 1));
j = 0;
}
}
}
The KMP algorithm does not construct a DFA. KMP算法不构造DFA。 What you have implemented is looks more like a DFA, which recognizes some string
pattern
. 您实现的内容看起来更像是DFA,它识别一些字符串
pattern
。
The idea behind KMP algorithm is to construct the so called prefix function for the given pattern
. KMP算法背后的想法是为给定
pattern
构造所谓的前缀函数。 And what is this function? 这个功能是什么? It's definition is that for each position
i
of the string we are interested in the length of the longest suffix of pattern[1..i]
, which is also a prefix of the pattern
string (0-indexed). 它的定义是,对于字符串的每个位置
i
,我们感兴趣的是pattern[1..i]
的最长后缀的长度,它也是pattern
字符串的前缀(0索引)。 This may sound confusing, but here is an example: 这可能听起来令人困惑,但这是一个例子:
The prefix function of pattern = "abacabacada"
is pf[] = 0 0 1 0 1 2 3 4 5 0 1
. pattern = "abacabacada"
的前缀函数是pf[] = 0 0 1 0 1 2 3 4 5 0 1
。 pf[8]
is equal to 5, because the longest suffix of "bacabaca", that is also a prefix of "abacabacada" is "abaca", which has length 5. Analogically, pf[9] = 0
because there is no suffix of bacabacad
which is also a prefix of abacabacada
(the pattern). pf[8]
等于5,因为“bacabaca”的最长后缀,也就是“abacabacada”的前缀是“abaca”,其长度为5.类似地, pf[9] = 0
因为没有后缀bacabacad
也是abacabacada
(模式)的前缀。
I hope that this explanation makes the prefix function clearer. 我希望这个解释使前缀功能更清晰。 Some friends call the array, storing the prefix function
fl
, short for "fail link" because while doing the matching, we use the values in this array only when the characters from text
and pattern
mismatch. 有些朋友调用数组,存储前缀函数
fl
,“失败链接”的缩写,因为在进行匹配时,我们只在text
和pattern
的字符不匹配时才使用此数组中的值。
Here is a clear implementation of the algorithm (in Java). 这是算法的清晰实现(在Java中)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.