简体繁体 English

自动机 - 正则表达式（Union Case）

[英]Automata - Regular Expression (Union Case)

原文 2021-11-02 16:40:28 0 2 regex/ regular-language/ automata/ automata-theory

Automata 1) Recognizes strings with at least 2 a自动机 1) 识别至少有 2 个 a 的字符串

Regular Expression = b*ab*a(a+b)*正则表达式 = b*ab*a(a+b)*

Automata 2) Recognizes strings with at least 2 b自动机 2) 识别至少有 2 个 b 的字符串

Regular Expression = a*ba*b(a+b)*正则表达式 = a*ba*b(a+b)*

The regular expression obtained from A3 = A1 U A2 is equivalent to R3 = R1 + R2?从A3 = A1 U A2得到的正则表达式等价于R3 = R1 + R2？ Or it's not?或者不是？

R3 = b*ab*a(a+b)* + a*ba*b(a+b)* R3 = b*ab*a(a+b)* + a*ba*b(a+b)*

2 个解决方案

Regular expressions are not like finite state parsers and it's usually a mistake to try to incorporate them into complex parsing scenarios.正则表达式不像有限状态解析器，尝试将它们合并到复杂的解析场景中通常是错误的。

But also, they are marvelous tools for specific problems.而且，它们还是解决特定问题的绝佳工具。 After reading your descriptive requirements, there is a simple regular expression that accomplishes it, but in a way you might not expect.阅读您的描述性需求后，有一个简单的正则表达式可以完成它，但方式可能出乎您的意料。 Your requirements:您的要求：

strings with at least 2 a至少有 2 个 a 的字符串
strings with at least 2 b至少有 2 个 b 的字符串
The Union of the two, or strings withat least two a's or two b's两者的并集，或至少有两个 a 或两个 b 的字符串
([ab]).*?\\1 ([ab]).*?\\1

This expression opens a capture group to capture either a or b.此表达式打开一个捕获组以捕获 a 或 b。 Then it allows zero or more 'any characters' followed by whatever was captured in the capture group (\\1).然后它允许零个或多个“任何字符”后跟捕获组中捕获的任何字符 (\\1)。

There is neither "one" automaton nor "one" regular expression for any language;任何语言都没有“一个”自动机或“一个”正则表达式； generally there many reasonable ones and many more (maybe infinitely many) unreasonable ones.通常有许多合理的和更多（可能无限多）不合理的。 In this sense, your question is not entirely well-posed: the regular expression corresponding to the union of two DFAs may or may not look like regular expressions for the original DFAs, +'ed together.从这个意义上说，您的问题并不完全恰当：对应于两个 DFA 联合的正则表达式可能看起来也可能不像原始 DFA 的正则表达式，+'ed 在一起。

So, if you mean, can they look the same, the answer is likely yes.所以，如果你的意思是，它们看起来能一样吗，答案很可能是肯定的。 If you mean, must they look the same, answer is likely no.如果你的意思是，它们必须看起来一样，答案可能是否定的。 If you instead want to fix the algorithms for constructing the union machine and getting the regular expression, maybe we could show that a fixed method of doing it always gives the same answer.相反，如果您想要修复构建联合机和获取正则表达式的算法，也许我们可以证明固定的方法总是给出相同的答案。

In your specific case, applying the Cartesian Product Machine construction to get a DFA for the union of the original DFAs and then applying the construction from the proof of equivalence between DFAs and REs, we can see that the structure of the RE obtained by +'ing the original REs can't be achieved starting from a DFA;在您的具体情况下，应用笛卡尔乘积机构造为原始 DFA 的并集获得 DFA，然后根据 DFA 和 RE 之间的等价性证明应用该构造，我们可以看到通过 +' 获得的 RE 的结构无法从 DFA 开始获取原始 RE； you'd have needed an NFA to get a + between the LHS and RHS, but DFAs can only + among individual symbols, not subexpressions.您需要 NFA 才能在 LHS 和 RHS 之间获得 +，但 DFA 只能在单个符号之间进行 +，而不能在子表达式之间进行。 Of course, it might be possible the RE can be algebraically manipulated to derive the target RE, but that isn't exactly the same.当然，可以对 RE 进行代数运算来导出目标 RE，但这并不完全相同。

All of the above hold for the question of equality of REs.以上所有内容都适用于 RE 的平等问题。 However, you asked about equivalence.但是，您询问了等效性。 Almost always, we say two REs are equivalent if they generate the same language.如果两个 RE 生成相同的语言，我们几乎总是说它们是等价的。 If this is what you meant, then yes, +ing the two REs will give an RE equivalent to the one obtained by constructing a union machine and deriving an RE from that.如果这就是你的意思，那么是的，+ing 两个 RE 将给出一个 RE 等同于通过构建联合机并从中派生 RE 获得的 RE。 The REs will not look the same but will generate the same language, just as (ab + e)(abab)* and (ab)* generate the same language despite looking a bit different. RE 看起来不一样，但会生成相同的语言，就像 (ab + e)(abab)* 和 (ab)* 尽管看起来有点不同但生成相同的语言一样。