简体   繁体   English

将优先级表转换为适合递归下降的语法?

[英]Translating precedence table into grammar appropriate for recursive descent?

If we have a language consisting only of atomic elements and unary and binary operators: 如果我们有一种只包含原子元素和一元和二元运算符的语言:

atomic elements: a b c
unary operators: ! ~ + -
binary operators: + - / *

Then we can define a grammar: 然后我们可以定义一个语法:

ATOM := a | b | c
UNOP := ! | ~ | + | -
BINOP := + | - | / | *
EXPR := ATOM | UNOP EXPR | EXPR BINOP EXPR

However this grammar leads to an ambiguous parse tree (and an infinite loop in a recursive descent parser due to left recursion). 然而,这种语法导致一个模糊的解析树(由于左递归而在递归下降解析器中产生无限循环)。

So we add a precendence table: 所以我们添加一个优先表:

Precendence 1: unary+ unary- ~ ! (Right to Left)
Precendence 2: * / (Left to Right)
Precendence 3: binary+ binary- (Left to Right)

My question is by what algorithm or procedure can we take the precedence table and produce an appropriate grammar for a recursive descent parser (not left-recursive). 我的问题是我们可以采用什么算法或过程来获取优先级表,并为递归下降解析器(不是左递归)生成适当的语法。

A precedence table is an ordered list of operator groups and associated directions (L->R or R<-L). 优先级表是操作员组和相关方向的有序列表(L-> R或R <-L)。 The answer would be something that takes this as input and produces grammars as output. 答案是将此作为输入并将语法作为输出。

It's easy enough to convert an operator precedence grammar into an LR(1) grammar [1], but the resulting grammar will use left recursion to parse left associative operators. 将运算符优先级语法转换为LR(1)语法[1]很容易,但结果语法将使用左递归来解析左关联运算符。 It's easy enough to eliminate the left recursion -- for example, make all operators right associative -- but while the resulting grammar recognizes the same language, the parse trees are different. 消除左递归很容易 - 例如,使所有运算符都正确关联 - 但是当结果语法识别相同的语言时,解析树是不同的。

It turns out that it's not hard to slightly modify the recursive descent parser to be able to handle precedence relations. 事实证明,稍微修改递归下降解析器以便能够处理优先级关系并不困难。 The technique was invented by Vaughan Pratt , and essentially uses the call-stack to substitute the explicit stack in the classic shunting-yard algorithm . 该技术是由Vaughan Pratt发明的,并且基本上使用调用堆栈来代替经典的分流码算法中的显式堆栈。

Pratt parsing seems to be undergoing some sort of revival, and you can find lots of blog posts about it; Pratt解析似乎正在经历某种复兴,你可以找到很多关于它的博客文章; one reasonably good one is by Eli Bendersky . 一个相当不错的是Eli Bendersky Pratt devised the procedure in the early 1970s, about the same time Frank deRemer was proving that LR(1) parsing was practical. 普拉特在20世纪70年代早期设计了这个程序,同时Frank deRemer证明LR(1)解析是实用的。 Pratt was skeptical about both the practicality and the inflexibility of formal parsing. 普拉特对正式解析的实用性和不灵活性持怀疑态度。 I think the debate has pretty well been simmering ever since. 从那以后,我认为这场辩论一直在酝酿着。 Pratt parsers are indeed simple and flexible, but on the other hand it can be very difficult to prove that they are correct (or that they parse a particular formally-described grammar). Pratt解析器确实简单而灵活,但另一方面,很难证明它们是正确的(或者它们解析特定的正式描述的语法)。 On the other hand, although bison has recently acquired support for GLR parsing, making it potentially a lot less fidgety to use, and despite the fact that bison -generated parsers actually parse the grammar they claim to parse, there are still many who would agree with Pratt's statement (from 1973) that formal parsing methods are "less accessible and less pleasant to use". 另一方面,尽管bison最近获得了对GLR解析的支持,但使用它可能不那么烦躁,尽管bison生成的解析器实际上解析了他们声称解析的语法,但仍有许多人会同意在Pratt的声明(从1973年开始)中,正式的解析方法“不易使用且使用起来不太令人愉快”。


[1] In practice, all yacc-derivatives and many other LR parser generators will accept precedence relations for disambiguating; [1]在实践中,所有yacc衍生物和许多其他LR解析器生成器都将接受优先关系以消除歧义; the resulting grammar tables are smaller and involve fewer unit reductions, so there is no particularly good reason not to use this technique if you're going to use a parser generator. 生成的语法表较小,涉及的单位减少量较少,因此如果您要使用解析器生成器,则没有特别好的理由不使用此技术。

The general grammar that describes arbitrary precedence can be parsed using LALR parsers which are table based and can be generated using yacc. 描述任意优先级的一般语法可以使用基于表的LALR解析器进行解析,并且可以使用yacc生成。 But this doesn't mean all is lost when you wish to use recursive descent parsers. 但是,这并不意味着当您希望使用递归下降解析器时,所有内容都会丢失。

The recursive descent parser can only verify whether the syntax is correct. 递归下降解析器只能验证语法是否正确。 Building a syntax tree is a different matter and precedence should be handled on the tree building level. 构建语法树是另一回事,应该在树构建级别上处理优先级。

So consider the following grammar without left recursion which can parse infix expressions. 因此,请考虑以下语法,而不使用可以解析中缀表达式的左递归。 Nothing special no sign of precedence: 没有什么特别没有优先权的迹象:

Expr := Term (InfixOp Term)*
InfixOp := '+' | '-' | '*' | '/'
Term := '(' Expr ')'
Term := identifier

(The notation used on the right side is regex like, the rules that have substitution written using large camel case, tokens are quoted or written using small camel case). (在右侧使用的符号是正则表达式,使用大型驼峰案例编写替换的规则,使用小骆驼案例引用或编写令牌)。

When building the syntax tree you have a current node which you add new nodes to. 构建语法树时,您有一个当前节点 ,您可以向其添加新节点。

Most often when you parse a rule you create a new child node on the current node and make that child the current node. 通常,在解析规则时,您在当前节点上创建一个新的子节点,并使该子节点成为当前节点。 When finished with the parsing you step up to the parent node. 完成解析后,您将升级到父节点。

Now this is what should be done differently when you parse the InfixOp rule. 现在,在解析InfixOp规则时,应该采用不同的InfixOp You should assign precedence strength to the relevant nodes. 您应该为相关节点分配优先级。 The Expr node have the weakest precedence, while all other operators have stronger ones. Expr节点具有最弱的优先级,而所有其他运算符具有更强的优先级。

Handling infix precedence 处理中缀优先级

When parsing the InfixOp rule do the following: 解析InfixOp规则时,请执行以下操作:

  1. While the current node's precedence stronger than the incoming node's precedence, keep going up one level (make the parent node the current). 虽然当前节点的优先级高于传入节点的优先级,但仍保持上升一级(使父节点成为当前节点)。

  2. Then insert the node for the incoming one as a parent of the last child of the current node and make it current. 然后插入传入节点的节点作为当前节点的最后一个子节点的父节点并使其成为当前节点。

Since the Expr node declared to have the weakest precedence it will ultimately stop the climbing. 由于Expr节点被声明具有最弱的优先级,它将最终停止攀爬。

Let's see an example: A+B*C 我们来看一个例子: A+B*C

There the current node always marked with ! 那里的当前节点总是标有! after consuming the current token. 消耗当前令牌后。

Parsed: none

Expr!

Parsed: A

Expr!
|
A

Parsed: A+

Expr
|
+!
|
A

Parsed: A+B

  Expr
  |
  +!
 / \
A   B

Parsed: A+B*

  Expr
  |
  +
 / \
A   *!
   /
  B

Parsed: A+B*C

  Expr
  |
  +
 / \
A   *!
   / \
  B   C

If you traverse this in postorder way, you will get reverse polish notation for the expression which can be used to evaluate it. 如果以后序方式遍历此方法,您将获得可用于评估它的表达式的反向抛光表示法。

Or the reverse an example: A*B+C 或者反过来一个例子: A*B+C

Parsed: none

Expr!

Parsed: A

Expr!
|
A

Parsed: A*

Expr
|
*!
|
A

Parsed: A*B

  Expr
  |
  *!
 / \
A   B

Parsed: A*B+

  Expr
  |
  +!
  |
  *
 / \
A   B

Parsed: A*B+C

    Expr
    |
    +!
   / \
  *   C
 / \
A   B

Handling associativity 处理关联性

There are operators that are left associative while others are right associative. 有些运算符是左关联的,而其他运算符是右关联的。 For example in the C language family the + is left associative while the = is right associative. 例如,在C语言族中, +是左关联的,而=是右关联的。

Actually the whole associativity thing is boils down to the handling of operators on the same precedence level. 实际上整个关联性事物归结为在相同优先级上处理运算符。 For left associative operators when climbing keep going up when you encounter an operator on the same precedence level. 对于左关联运算符,当您在​​相同的优先级别遇到运算符时,攀爬会继续上升。 For right associative operators, stop when you encounter the same operator. 对于右关联运算符,遇到相同的运算符时停止。

(It takes too much space to demonstrate all techniques, I recommend trying it out on a piece of paper.) (展示所有技术需要太多空间,我建议在一张纸上试一试。)

Handling prefix and postfix operators 处理前缀和后缀运算符

In this case you need to modify the grammar a bit: 在这种情况下,您需要稍微修改语法:

Expr := PrefixOp* Term PostfixOp* (InfixOp PrefixOp* Term PostfixOp*)*
InfixOp := '+' | '-' | '*' | '/'
Term := '(' Expr ')'
Term := identifier

When you encounter a prefix operator, just add it as a new child to the current node and make the new child as current, regardless of precedence, it will be correct even if it's a strong operator or a weak one, the precedence climbing rules of the infix operators ensure the correctness. 当您遇到前缀运算符时,只需将其作为新子项添加到当前节点并将新子项作为当前节点,无论优先级如何,即使它是强运算符或弱运算符也是正确的,优先级上升规则为中缀运营商确保正确性。

For postfix operators you can use the same precedence climbing I described at infix operators, the only difference that we don't have a right side for postfix operators, so it will have only 1 child. 对于后缀运算符,您可以使用我在中缀运算符中描述的相同优先级攀升,唯一的区别是我们没有右侧的后缀运算符,因此它只有1个子节点。

Handling ternary operators 处理三元运算符

The C language family has the ?: ternary operator. C语言系列有?:三元运算符。 With regard to the syntax tree building you can handle the ? 关于语法树构建,你可以处理? and : as separate infix operators. :作为单独的中缀运算符。 But there is a trick. 但有一个技巧。 The node you create for the ? 您为?创建的节点? should be an incomplete ternary node, which means you do the usual precedence climbing and place it, but this incomplete node will have lowest precedence, this prevents even weaker operators like comma operator climb over it. 应该是一个不完整的三元节点,这意味着你进行通常的优先级攀登并放置它,但是这个不完整的节点将具有最低的优先级,这可以防止甚至更弱的运算符如逗号运算符爬过它。 When you reach the : you must climb up till the first incomplete ternary node (if you don't find one, then report syntax error), then change it to a complete node which will have normal precedence, and make it current. 当你到达:你必须爬到第一个不完整的三元节点(如果你没有找到一个,然后报告语法错误),然后将它改为一个具有正常优先级的完整节点,并使其成为当前节点。 If you reach the end of the expression unexpectedly when there are incomplete ternary nodes on the current branch, again report a syntax error. 如果当前分支上存在不完整的三元节点时意外到达表达式的末尾,则再次报告语法错误。

So the a , b ? c : d 那么a , b ? c : d a , b ? c : d is interpreted as a , (b ? c : d) . a , b ? c : d被解释为a , (b ? c : d)

But the a ? c , d : e 但是a ? c , d : e a ? c , d : e will be interpreted as a ? (c , d) : e a ? c , d : e会被解释为a ? (c , d) : e a ? (c , d) : e , since we prevented the climbing of the comma over the ?. a ? (c , d) : e ,因为我们阻止了逗号爬过?

Handling array indexes and function calls 处理数组索引和函数调用

Despite the postfix appearance they are infix operators with syntactically enforced parenthesized term on the right, this is true for array indexes and function calls as well. 尽管有后缀外观,但它们是中缀操作符,右侧是语法强制括号术语,对于数组索引和函数调用也是如此。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM