简体   繁体   English

有限自动机字符串匹配器

[英]Finite Automata String Matcher

I am trying to build a FA string matcher using java. 我正在尝试使用java构建一个FA字符串匹配器。 I have the following pseudocode. 我有以下伪代码。

在此输入图像描述

For the Finite-Automata-Matcher algorithm to work the transition function has to be computed. 为了使有限自动机匹配器算法工作,必须计算转换函数。 The following algorithm Compute-Transition-Function computes given a the pattern P and the alphabet sigma. 以下算法Compute-Transition-Function计算给定模式P和字母sigma。

在此输入图像描述

In the above code I couldn't understand where did min(m + 1, q + 2) come from. 在上面的代码中,我无法理解min(m + 1,q + 2)来自何处。 (I did understand why it is k = min(m + 1, q + 2) instead of k = min(m, q + 1) but why we want the minimum of m and q+1 in the first place) (我确实理解为什么它是k = min(m + 1,q + 2)而不是k = min(m,q + 1)但是为什么我们想要最小的m和q + 1?

In between lines at 5-7 it decreases k by one until Pk is a suffix of Pqa, but I couldn't understand what Pqa stands for. 在5-7行之间,它将k减1,直到Pk是Pqa的后缀,但我无法理解Pqa代表什么。

Also, how can I convert the line 8 to a java code? 另外,如何将第8行转换为java代码? Would a two-dimensional array be sufficient or do I need another data structure. 二维数组是否足够,或者我需要另一个数据结构。

A related question: string matching with finite automata 一个相关的问题: 与有限自动机匹配的字符串

On the internal repeat-until loop say we have Pq = 'abdab' and string is 'abdabcd', and our alphabet is abcd, and we are looking for best alternative for every symbol from alphabet, and then store transition to the new state. 在内部重复 - 直到循环说我们有Pq ='abdab'而字符串是'abdabcd',我们的字母表是abcd,我们正在为字母表中的每个符号寻找最佳替代方案,然后将转换存储到新状态。 In case above, by 'a', we should move to the beginning, 'b' to the very beginning, c prolongates match, and d symbol should store pointer to the third symbol in our initial string. 在上面的情况下,通过'a',我们应该移动到开头,'b'到最开头,c延长匹配,并且d符号应该存储指向我们的初始字符串中的第三个符号的指针。 So Pqa should be read as Pq plus character a from alphabet. 所以Pqa应该被读作Pq加上字母表中的字符a。

Edit why we want min of (q+2 and m+1), because we would like to perform one step forward, and we also would like to limit length of string, which is obvious. 编辑我们想要min(q + 2和m + 1)的原因,因为我们想向前迈出一步,我们也想限制字符串的长度,这很明显。 Why cannot we perform q+3, +4? 为什么我们不能执行q + 3,+ 4? It's because we are adding just one character, and it's not possible to extend best match from q to q+2,+3, by just a single character. 这是因为我们只添加了一个字符,并且不可能只通过一个字符将q的最佳匹配扩展到q + 2,+ 3。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM