简体   繁体   English

如何基于给定的正则表达式构造CFG

[英]How to construct a CFG based on a given regular expression

I am trying to figure out how to construct a CFG (context free grammar) based on a given regular expression. 我试图弄清楚如何基于给定的正则表达式构造CFG(无上下文语法)。 For example, a(ab)*(a|b) I think there is an algorithm to go through, but it is really confusing. 例如,a(ab)*(a | b)我认为有一个算法可以通过,但它确实令人困惑。 here is what i got so far: 这是我到目前为止所得到的:

    S->aAB; 
    A->aAb|empty;
    B->a|b;

Does this look right? 这看起来不错吗? Any help would be appreciated. 任何帮助,将不胜感激。

Construct the CFG in three parts, each for a , (ab)* and (a|b) . 将CFG分为三个部分,分别为a(ab)*(a|b)

For (a|b) , you've got B -> a | b 对于(a|b) ,你有B -> a | b B -> a | b right. B -> a | b对。

(ab)* would mean strings like ab , abab , ababab and so on. (ab)*意思是abababababab等字符串。 So A -> abA | empty 所以A -> abA | empty A -> abA | empty would be the correct production. A -> abA | empty将是正确的生产。

Hence, the full grammar becomes: 因此,完整的语法变为:

S -> aAB
A -> abA | empty
B -> a | b

Note: A -> aAb | empty 注意: A -> aAb | empty A -> aAb | empty would derive strings like ab , aabb , aaabbb and so on, which is not a regular language , and can't possibly represent a regular expression . A -> aAb | empty会导出像abaabbaaabbb等字符串,这不是常规语言 ,也不可能代表正则表达式

Another way to construct a context-free grammar for a given regular expression is: 为给定正则表达式构造无上下文语法的另一种方法是:

  1. Construct a finite state machine which accepts the same language as the regular expression. 构造一个有限状态机,它接受与正则表达式相同的语言。
  2. Create a grammar whose terminals are those in the alphabet of the regular expression, whose non-terminals are (or correspond 1:1 to) the states in the state machine, and which has a rule of the form X -> t Y for every state-machine transition from state X to state Y on terminal symbol t. 创建一个语法,其终端是正则表达式的字母表,其非终端是(或对应于1:1)状态机中的状态,并且具有X -> t Y形式的规则状态机在终端符号t上从状态X转换到状态Y. If your CFG notation allows it, each final state F gets a rule of the form F -> epsilon . 如果您的CFG符号允许,每个最终状态F都会得到F -> epsilon形式的规则。 If your CFG notation doesn't allow such rules, then for each transition from state X to final state F on terminal t, produce the rule X -> t (in addition to the rule X -> t F already described). 如果您的CFG表示法不允许这样的规则,那么对于从终端t的状态X到最终状态F的每次转换,产生规则X -> t (除了已经描述的规则X -> t F )。 The result is a regular grammar, a context-free grammar that obeys the additional constraint that each right-hand side has at most one non-terminal. 结果是一个常规语法,一个无上下文的语法,遵循每个右侧最多有一个非终端的附加约束。

For the example given, assume we construct the following FSA (of the many that accept the same language as the regular expression): 对于给出的示例,假设我们构造了以下FSA(许多接受与正则表达式相同的语言):

语言<code> a(ab)*(a | b)</ code>的FSA

From this, it is straightforward to derive the following regular grammar: 由此,可以直接推导出以下常规语法:

S -> a A1
A1 -> a A2
A2 -> b B3
B3 -> a A2
B3 -> a A4
B3 -> b B5
A1 -> a A4
A1 -> b B5
A4 -> epsilon
B5 -> epsilon
epsilon -> 

Or, if we don't want rules with an empty right-hand side, drop the last three rules of that grammar and add: 或者,如果我们不想要具有空右侧的规则,请删除该语法的最后三个规则并添加:

A1 -> a
A1 -> b
B3 -> a
B3 -> b

Compared with other approaches, this method has the disadvantage that the resulting grammar is more verbose than it needs to be, and the advantage that the derivation can be entirely mechanical, which means it's easier to get right without having to think hard. 与其他方法相比,这种方法的缺点是得到的语法比它需要的更冗长,并且推导可以完全机械化的优点,这意味着它更容易正确而不必刻意思考。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM