[英]Looking for advice on making this BNF grammar suitable for LL(1) parsing (left factoring)
I am working on a parsing project that uses an adaption of this grammar for Perl's regular expressions http://www.cs.sfu.ca/~cameron/Teaching/384/99-3/regexp-plg.html . 我正在做一个解析项目,该项目对Perl的正则表达式使用了此语法的改编版本http://www.cs.sfu.ca/~cameron/Teaching/384/99-3/regexp-plg.html 。 I have simplified this grammar for my own purposes, like so (note that, because "|" is a token, I am instead using a comma "," so seperate productions for a given nonterminal):
我已经出于我自己的目的简化了该语法,例如(请注意,因为“ |”是一个标记,所以我使用逗号“”,因此给定的非终结符是单独产生的):
<RE> := <union>, <simple>
<union> := <RE> | <simple>
<simple> := <concat>, <basic>
<concat> := <simple><basic>
<basic> := <zero>, <one>, <onezero>, <elem>
<zero> := <elem>*
<one> := <elem>+
<onezero> := <elem>?
<elem> := <group>, <any>, <literal>, <class>
<group> := (<RE>)
<class> := [<items>]
<items> := <item>, <item><items>
<item> := <range>, <literal>
I want to write a LL(1) parser to handle this grammar, and for an LL(1) parser the productions for <items>
have some ambiguity. 我想编写一个LL(1)解析器来处理此语法,并且对于LL(1)解析器,
<items>
具有一些歧义。 To fix this, I could left-factor them by adding a new nonterminal <X>
, like so: 为了解决这个问题,我可以通过添加一个新的非终结符
<X>
来对它们进行左分解,如下所示:
<items> := <item><X>
<X> := <items>, epsilon
But what I'm wondering is, could I just flip around the order of the second production in <items>
, like this: 但是我想知道的是,是否可以在
<items>
翻转第二个产品的顺序,如下所示:
<items> := <item>, <items><item>
and avoid adding a new nonterminal? 并避免添加新的非终结符? It doesn't look like it breaks anything, after all the whole point of this production is to allow any variable number of sequential
<item>
symbols, and we still get that if we reverse the order. 看起来这并没有破坏任何东西,毕竟这个生产的全部目的是允许任意数量的连续
<item>
符号,并且如果我们颠倒顺序,我们仍然会得到它。 Am I missing something, or does simply reversing the order achieve the same goal as left-factoring in this situation? 我是否缺少某些东西,还是在这种情况下简单地颠倒顺序就可以达到与左因子分解相同的目标?
The problem you are trying to fix is that 您要解决的问题是
items → item
items → item items
is not left-factored; 不是左撇子; both productions start with
item
. 两种生产都从
item
开始。
Your proposed fix 您建议的修复
items → item
items → items item
doesn't really help (whatever starts item
can still start either production of items
), but more importantly, it is left-recursive, which is verboten for LL grammars. 并不能真正帮助(不管启动
item
仍然可以启动或者生产items
),但更重要的是,它是左递归的,这是禁止的为LL语法。
In principle, the "new non-terminal" is the correct solution, but in a recursive descent parser, you would probably do something like this: 原则上,“ new non-terminal”是正确的解决方案,但是在递归下降解析器中,您可能会执行以下操作:
def items():
list = [ item() ]
while couldStart(item, lookahead):
list.append(item())
return list
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.