简体   繁体   English

如何将正则语法转换为正则表达式?

[英]How to convert a regular grammar to regular expression?

是否有将正则语法转换为正则表达式的算法或工具?

Answer from dalibocai :来自dalibocai 的回答:

My goal is to convert regular grammer to DFA.我的目标是将常规语法转换为 DFA。 Finally, I found an excellent tool : JFLAP .最后,我找到了一个很好的工具: JFLAP

The algorithm is pretty straightforward if you can compute an automaton from your regular expression.如果您可以从正则表达式计算自动机,则该算法非常简单。 Once you have your automaton.一旦你有了你的自动机。 For instance for (aa*b|c) , an automaton would be (arrows go to the right):例如对于(aa*b|c) ,自动机将是(箭头向右):

          a
         / \
      a  \ / b
-> 0 ---> 1 ---> 2 ->
    \___________/
          c

Then just "enumerate" your transitions as rules.然后只需“枚举”您的转换作为规则。 Below, consider that 0, 1, and 2 are nonterminal symbols, and of course a, b and c are the tokens.下面,考虑0、1、2是非终结符,当然a、b、c是记号。

0: a1 | c2
1: a1 | b2
2: epsilon

or, if you don't want empty right-hand sides.或者,如果您不想要空的右侧。

0: a1 | c
1: a1 | b

And of course, the route in the other direction provides one means to convert a regular grammar into an automaton, hence a rational expression.当然,另一个方向的路线提供了一种将正则文法转换为自动机的方法,因此是有理表达式。

From a theoretical point of view, an algorithm to solve this problem works by creating a regular expression from each rule in the grammar, and solving the resulting system of equations for the initial symbol.从理论的角度来看,解决这个问题的算法是通过从语法中的每个规则创建一个正则表达式,并求解初始符号的结果方程组来工作的。

For example, for regular grammar ({S,A},{a,b,c},P,S) :例如,对于正则语法({S,A},{a,b,c},P,S)

P:
   S -> aA | cS | a  | c
   A -> aA | a  | bS
  1. Take each non-termimal symbol and generate regular expression from right hand:取每个非终结符并从右手生成正则表达式:

     S = aA + cS + a + c A = aA + bS + c
  2. Solve equation system for initial symbol S :求解初始符号S方程组:

     A = a(aA + bS + c) + bS + c A = a⁺bS + a⁺c + bS + c S = aA + c(aA + cS + a + c) S = aA + c⁺aA + c⁺a + c⁺ S = a(a⁺bS + a⁺c + bS + c) + c⁺a(a⁺bS + a⁺c + bS + c) + c⁺a + c⁺ S = a⁺bS + a⁺c + c⁺a⁺bS + c⁺a⁺c + c⁺a + c⁺ S = (c⁺ + ε)a⁺bS + a⁺c + c⁺(a⁺c + a + ε) substitution: x = (c⁺ + ε)a⁺b S = x(xS + a⁺c + c⁺(a⁺c + a + ε)) + a⁺c + c⁺(a⁺c + a + ε) S = x⁺a⁺c + x⁺c⁺(a⁺c + a + ε) + a⁺c + c⁺(a⁺c + a + ε) S = x*(a⁺c + c⁺(a⁺c + a + ε)) S = ((c⁺ + ε)a⁺b)*(⁺a⁺c + c⁺(a⁺c + a + ε))

Because all modifications were equivalent, ((c⁺ + ε)a⁺b)*(⁺a⁺c + c⁺(a⁺c + a + ε)) is a regular expression equivalent to all words which can be produced from the initial symbol.因为所有的修改都是等价的,所以((c⁺ + ε)a⁺b)*(⁺a⁺c + c⁺(a⁺c + a + ε))是一个正则表达式等价于所有可以从初始符号。 Thus the value of this expression must be equivalent to the language generated by the grammar whose initial symbol is S.因此,这个表达式的值必须等价于由初始符号为 S 的文法生成的语言。

It ain't pretty , but i purposefully picked a grammar including cycles to portray the way the algorithm works.它并不漂亮,但我有目的地选择了一种包括循环的语法来描绘算法的工作方式。 The hardest part is recognizing that S = xS | x最难的部分是认识到S = xS | x S = xS | x is equivalent to S = x⁺ , then just doing the substitutions. S = xS | x等价于S = x⁺ ,然后只是进行替换。

I'll leave this as an answer to this old question, in case that anybody finds it useful:我将把它作为这个老问题的答案,以防有人觉得它有用:

I have recently released a library for exactly that purpose:我最近为此目的发布了一个库:

https://github.com/rindPHI/grammar2regex https://github.com/rindPHI/grammar2regex

You can precisely convert regular grammars, but also compute approximate regular expressions for more general general context-free grammars.您可以精确地转换正则文法,也可以为更通用的上下文无关文法计算近似正则表达式。 The output format can be configured to be a custom ADT type or the regular expression format of the z3 SMT solver (z3.ReRef).输出格式可以配置为自定义 ADT 类型或 z3 SMT 求解器 (z3.ReRef) 的正则表达式格式。

Internally, the tool converts grammars to finite automata.在内部,该工具将语法转换为有限自动机。 If you're interested in the automaton itself, you can call the method right_linear_grammar_to_nfa .如果您对自动机本身感兴趣,可以调用方法right_linear_grammar_to_nfa

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM