简体繁体 English

DFA到正则表达式时间复杂度

[英]DFA to regular expression time complexity

原文 2013-04-19 00:20:13 5 2 regex/ algorithm/ time-complexity/ automata

I am looking at the time complexity analysis of converting DFAs to regular expressions in the "Introduction to the Automata Theory, Languages and Computation", 2nd edition, page 151, by Ullman et al. 我正在研究Ullman等人在“自动机理论，语言和计算导论”第2版第151页中将DFA转换为正则表达式的时间复杂度分析。 This method is sometimes referred to as the transitive closure method . 该方法有时被称为传递闭包方法。 I don't understand how they came up with the 4^n expression in the O((n^3)*(4^n)) time complexity. 我不明白他们是如何在O（（n ^ 3）*（4 ^ n））时间复杂度中提出4 ^ n表达式的。

I understand that the 4^n expression holds regarding space complexity, but, regarding time complexity, it seems that we are performing only four constant time operations for each pair of states at each iteration, using the results of the previous iterations. 我理解4 ^ n表达式关于空间复杂度，但是，关于时间复杂度，似乎我们使用先前迭代的结果在每次迭代时仅对每对状态执行四次恒定时间操作。 What am I exactly missing? 我究竟缺少什么？

2 个解决方案

It's a crude bound on the complexity of an algorithm that isn't using the right data structures. 这是对未使用正确数据结构的算法复杂性的粗略约束。 I don't think that there's much to explain other than that the authors clearly did not care to optimize here, probably because their main point was that regular expressions are at least as expressive as DFAs and because they feel that it's pointless to optimize this exponential-time algorithm. 我不认为除了作者显然不关心在这里进行优化之外还有很多解释，可能是因为他们的主要观点是正则表达式至少与DFA一样具有表现力，并且因为他们认为优化这种指数是没有意义的时间算法。

There are three nested loops of n iterations each; 每个n次迭代有三个嵌套循环; the regular expressions constructed during iteration k of the outer loop inductively have size O(4^k), since they are constructed from at most four regular expressions constructed during the previous iteration. 在外环的迭代k期间构造的正则表达式具有大小O（4 ^ k），因为它们是由在前一次迭代期间构造的至多四个正则表达式构造的。 If the algorithm copies these subexpressions and we overestimate the regular-expression size bound at O(4^n) for all iterations, then we get O(n^3 4^n). 如果算法复制这些子表达式并且我们高估了在所有迭代时在O（4 ^ n）处绑定的正则表达式大小，那么我们得到O（n ^ 3 4 ^ n）。

Obviously we can do better. 显然我们可以做得更好。 Without eliminating the copying, we can get O(sum_{k=1}^nn^2 4^k) = O(n^2 (n + 4^n)) by bounding the geometric sum properly. 在不消除复制的情况下，通过适当地界定几何和，我们可以得到O（sum_ {k = 1} ^ nn ^ 2 4 ^ k）= O（n ^ 2（n + 4 ^ n））。 Moreover, as you point out, we don't need to copy at all, except at the end if we agree with templatetypedef that the output must be completely written out, giving a running time of O(n^3) to prepare the regular expression and O(4^n) to write it out. 此外，正如您所指出的，我们根本不需要复制，除非在最后我们同意templatetypedef必须完全写出输出，给出O（n ^ 3）的运行时间以准备常规表达式和O（4 ^ n）写出来。 The space complexity for this version equals the time complexity. 此版本的空间复杂性等于时间复杂度。

I suppose your doubt is about the n ³ Time Complexity. 我想你的疑问是关于N ³时间复杂度。

Let us assume R _ij ^k represents the set of all strings that transition the automata from state q _i to q _j without passing through any state higher than q _k . 让我们假设R _ij ^k表示将自动机从状态q _i转换到q _j而不经过任何高于q _k的状态的所有字符串的集合。

Then the iterative formula for R _ij ^k is shown below, 那么R _ij ^k的迭代公式如下所示，

R _ij ^k = R _ik ^k-1 (R _kk ^k-1 ) ^* R _kj ^k-1 + R _ij ^k-1 . R _ij ^k = R _ik ^k-1 （R _kk ^k-1 ） ^* R _kj ^k-1 + R _ij ^k-1 。

This technique is similar to the all-pairs shortest path problem. 该技术类似于全对最短路径问题。 The only difference is that we are taking the union and concatenation of regular expressions instead of summing up distances. 唯一的区别是我们正在采用正则表达式的联合和连接而不是总结距离。 The Time Complexity of all-pairs shortest path problem is n ³ . 全对最短路径问题的时间复杂度为n ³ 。 So we can expect the same complexity for DFA to Regular Expression Conversion also. 因此，我们可以预期DFA与Regular Expression转换具有相同的复杂性。 The same method can also be used to convert NFA and ε-NFA to corresponding Regular Expressions. 同样的方法也可用于将NFA和ε-NFA转换为相应的正则表达式。

The main problem of transitive closure approach is that it creates very large regular expressions. transitive closure approach的主要问题是它创建了非常大的正则表达式。 This large length is due to the repeated union of concatenated terms. 如此大的长度是由于连接术语的重复结合。