简体繁体 English

生成具有死态或多余状态的DFA的正则表达式

[英]Regular expression that generates a DFA with dead or superfluous states

原文 2012-02-20 10:08:05 9 1 regex/ dfa/ nfa

I'm looking to implement a DFA minimizer in my lexer, but I can't seem to produce a DFA that doesn't look like it's already the minimal DFA for the expression. 我正在寻找在我的词法分析器中实现DFA最小化器的方法，但似乎无法生成看起来好像已经是表达式的最小DFA的DFA。

I'm constructing the DFA from a NFA that is built using thomson construction from a postfix regular expression. 我正在从NFA构造DFA，而NFA是使用后缀正则表达式的thomson构造构建的。 It's pretty much exactly what is being described in the dragon book. 这几乎与龙书中所描述的完全一样。 To make the lexer several of the NFAs are combined using epsilon transitions from the start state. 为了制作词法分析器，使用从开始状态开始的epsilon转换将几个NFA组合在一起。 It's on this combined NFA that the DFA algorithm is applied. DFA算法就是在此组合NFA上应用的。

So, is there any "known" regular expression that will generate a DFA which will make a nice test bed for dead state elimination and state minimization? 那么，是否有任何“已知”正则表达式会生成DFA，从而为消除死角和最小化状态提供一个很好的测试平台？

I could of course just hack up a weird DFA and apply the algorithms on it, but it would not really be a proper test case would it? 当然，我可以破解一个奇怪的DFA并在上面应用算法，但这不是一个合适的测试用例吗？ If it's so that the method I'm constructing DFAs isn't prone to dead states, then that information would be just as valueable, since then I can skip implementing the state elimination feature altogether. 如果这样的话，我构建DFA的方法就不容易出现死态，那么该信息将同样有价值，因为从那时起，我可以完全跳过实施状态消除功能。

Edit: In case you need implementation details in order to accurately answer, the code is available on github , specifically the NFA.cs and DFA.cs classes. 编辑：如果您需要实现细节才能准确回答，则代码可在github上找到，尤其是NFA.cs和DFA.cs类。 Additionally I wrote a series on blog posts on the construction algorithm I'm using, if that helps. 此外，如果有帮助，我还会在博客文章中撰写有关正在使用的构造算法的系列文章。

1 个解决方案

Ok, so I found this out in a totally roundabout way. 好的，所以我以完全绕行的方式发现了这一点。 I made a tool for visualizing regular expression since I got quite a nice debug output from my parser. 因为从解析器中获得了不错的调试输出，所以我制作了可视化正则表达式的工具。 This aptly illustrates such an expression that using standard thompson construction techniques will give you a pretty stupid automata: (a+b+c+)+|abc 这恰如其分地说明了这样一种表达方式，即使用标准的汤普森构造技术将为您提供一个非常愚蠢的自动机： (a+b+c+)+|abc

Shown in the tool: http://regexvisualizer.apphb.com/?Regex=%28a%2Bb%2Bc%2B%29%2B%7Cabc&NfaSize=300&DfaSize=250# 工具中显示的地址： http ://regexvisualizer.apphb.com/?Regex=%28a%2Bb%2Bc%2B%29%2B%7Cabc&NfaSize=300&DfaSize= 250#

This tool currently performs a straight up thompson construction without any optimization. 该工具目前没有任何优化即可直接进行汤普森构造。 If you remove the |abc part of the expression which is entirely superfluous the expression should stay the same. 如果除去完全多余的表达式的|abc部分，则该表达式应保持不变。 It doesn't. 没有。