简体   繁体   English

用最少的括号漂亮地打印 AST

[英]Pretty Printing AST with Minimal Parentheses

I'm implementing a pretty-printer for a JavaScript AST and I wanted to ask if someone is aware of a "proper" algorithm to automatically parenthesize expressions with minimal parentheses based on operator precedence and associativity .我正在为 JavaScript AST 实现一个漂亮的打印机,我想问一下是否有人知道一种“正确的”算法,可以根据 operator precedence 和associativity自动将表达式与最小括号括起来 I haven't found any useful material on the google.我在谷歌上没有找到任何有用的材料。

What seems obvious is that an operator whose parent has a higher precedence should be parenthesized, eg:似乎很明显的是,父级具有更高优先级的运算符应该用括号括起来,例如:

(x + y) * z // x + y has lower precedence

However, there are also some operators which are not associative, in which case parentheses are still are needed, eg:但是,也有一些运算符不是关联的,在这种情况下仍然需要括号,例如:

x - (y - z) // both operators have the same precedence

I'm wondering what would be the best rule for this latter case.我想知道后一种情况的最佳规则是什么。 Whether it's sufficient to say that for division and subtraction, the rhs sub-expression should be parenthesized if it has less than or equal precedence.对于除法和减法来说是否足够,如果 rhs 子表达式具有小于或等于的优先级,则应该用括号括起来。

I stumbled on your question in search of the answer myself.我偶然发现了你的问题,我自己也在寻找答案。 While I haven't found a canonical algorithm, I have found that, like you say, operator precedence alone is not enough to minimally parenthesize expressions.虽然我还没有找到规范的算法,但我发现,就像你说的那样,单独的运算符优先级不足以最小化括号表达式。 I took a shot at writing a JavaScript pretty printer in Haskell, though I found it tedious to write a robust parser so I changed the concrete syntax: https://gist.github.com/kputnam/5625856我尝试在 Haskell 中编写一个 JavaScript 漂亮的打印机,虽然我发现编写一个强大的解析器很乏味,所以我改变了具体的语法: https : //gist.github.com/kputnam/5625856

In addition to precedence, you must take operator associativity into account.除了优先级之外,您还必须考虑运算符关联性。 Binary operations like / and - are parsed as left associative./-这样的二元运算被解析为左关联。 However, assignment = , exponentiation ^ , and equality == are right associative.但是,赋值= 、求幂^和相等==是右结合的。 This means the expression Div (Div ab) c can be written a / b / c without parentheses, but Exp (Exp ab) c must be parenthesized as (a ^ b) ^ c .这意味着表达式Div (Div ab) c可以写成a / b / c而不带括号,但Exp (Exp ab) c必须被写成(a ^ b) ^ c

Your intuition is correct: for left-associative operators, if the left operand's expression binds less tightly than its parent, it should be parenthesized.您的直觉是正确的:对于左结合运算符,如果左操作数的表达式与其父级的绑定不紧密,则应将其加括号。 If the right operand's expression binds as tightly or less tightly than its parent, it should be parenthesized.如果右操作数的表达式与其父级绑定的紧密不紧密,则应将其括起来。 So Div (Div ab) (Div cd) wouldn't require parentheses around the left subexpression, but the right subexpression would: a / b / (c / d) .所以Div (Div ab) (Div cd)不需要左子表达式周围的括号,但右子表达式需要: a / b / (c / d)

Next, unary operators, specifically operators which can either be binary or unary, like negation and subtraction - , coercion and addition + , etc might need to be handled on a case-by-case basis.接下来,一元运算符,特别是可以是二元或一元的运算符,如否定和减法- 、强制和加法+等,可能需要根据具体情况进行处理。 For example Sub a (Neg b) should be printed as a - (-b) , even though unary negation binds more tightly than subtraction.例如Sub a (Neg b)应该打印为a - (-b) ,即使一元否定比减法绑定更紧密。 I guess it depends on your parser, a - -b may not be ambiguous, just ugly.我想这取决于您的解析器, a - -b可能不会含糊不清,只是丑陋。

I'm not sure how unary operators which can be both prefix and postfix should work.我不确定可以是前缀和后缀的一元运算符应该如何工作。 In expressions like ++ (a ++) and (++ a) ++ , one of the operators must bind more tightly than the other, or ++ a ++ would be ambiguous.++ (a ++)(++ a) ++等表达式中,其中一个运算符的绑定必须比另一个更紧密,否则++ a ++会产生歧义。 But I suspect even if parentheses aren't needed in one of those, for the sake of readability, you may want to add parentheses anyway.但我怀疑即使其中一个不需要括号,为了可读性,您可能还是想添加括号。

It depends on the rules for the specific grammar.这取决于特定语法的规则。 I think you have it right for operators with different precedence, and right for subtraction and division.我认为您对具有不同优先级的运算符以及减法和除法的运算符都正确。

Exponentiation, however, is often treated differently, in that its right hand operand is evaluated first.然而,幂运算通常被区别对待,因为它的右手操作数首先被评估。 So you need所以你需要

 (a ** b) ** c

when c is the right child of the root.当 c 是根的右孩子时。

Which way the parenthesization goes is determined by what the grammar rules define.括号的走向取决于语法规则的定义。 If your grammar is of the form of如果你的语法是这样的形式

exp = sub1exp ;
exp = sub1exp op exp ;
sub1exp = sub1exp ;  
sub1exp = sub1exp op1 sub2exp ;
sub2exp = sub3exp ;
sub2exp = sub3exp op2 sub2exp ;
sub3exp = ....
subNexp = '(' exp ')' ;

with op1 and op2 being non-associative, then you want to parenthesize the right subtree of op1 if the subtree root is also op1, and you want to parenthesize the left subtree of op2 if the left subtree has root op2. op1 和 op2 是非关联的,如果子树根也是 op1,你想把 op1 的右子树放在括号里,如果左子树有根 op2,你想把 op2 的左子树放在括号里。

There is a generic approach to pretty printing expressions with minimal parentheses.有一种使用最少括号来漂亮打印表达式的通用方法。 Begin by defining an unambiguous grammar for your expression language which encodes precedence and associativity rules.首先为您的表达式语言定义一个明确的语法,该语法对优先级和关联性规则进行编码。 For example, say I have a language with three binary operators (*, +, @) and a unary operator (~), then my grammar might look like例如,假设我的语言包含三个二元运算符(*、+、@)和一个一元运算符(~),那么我的语法可能看起来像

E -> E0

E0 -> E1 '+' E0       (+ right associative, lowest precedence)
E0 -> E1

E1 -> E1 '*' E2       (* left associative; @ non-associative; same precedence)
E1 -> E2 '@' E2
E1 -> E2

E2 -> '~' E2          (~ binds the tightest)
E2 -> E3

E3 -> Num             (atomic expressions are numbers and parenthesized expressions)
E3 -> '(' E0 ')'

Parse trees for the grammar contain all necessary (and unnecessary) parentheses, and it is impossible to construct a parse tree whose flattening results in an ambiguous expression.语法的解析树包含所有必要的(和不必要的)括号,并且不可能构造一个解析树,其扁平化会导致歧义表达式。 For example, there is no parse tree for the string例如,字符串没有解析树

1 @ 2 @ 3

because '@' is non-associative and always requires parentheses.因为“@”是非关联的并且总是需要括号。 On the other hand, the string另一方面,字符串

1 @ (2 @ 3)

has parse tree有解析树

E(E0(E1( E2(E3(Num(1)))
         '@'
         E2(E3( '('
                E0(E1(E2(E3(Num(2)))
                      '@'
                      E2(E3(Num(3)))))
                ')')))

The problem is thus reduced to the problem of coercing an abstract syntax tree to a parse tree.问题因此简化为将抽象语法树强制转换为解析树的问题。 The minimal number of parentheses is obtained by avoiding coercing an AST node to an atomic expression whenever possible.通过尽可能避免将 AST 节点强制转换为原子表达式来获得最少数量的括号。 This is easy to do in a systematic way:这很容易以系统的方式完成:

Maintain a pair consisting of a pointer to the current node in the AST and the current production being expanded.维护一对由指向 AST 中当前节点的指针和正在扩展的当前产品组成的对。 Initialize the pair with the root AST node and the 'E' production.使用根 AST 节点和“E”产生式初始化该对。 In each case for the possible forms of the AST node, expand the grammar as much as necessary to encode the AST node.在每种情况下,对于 AST 节点的可能形式,尽可能多地扩展语法以对 AST 节点进行编码。 This will leave an unexpanded grammar production for each AST subtree.这将为每个 AST 子树留下一个未扩展的语法产生式。 Apply the method recursively on each (subtree, production) pair.在每个(子树,生产)对上递归应用该方法。

For example, if the AST is (* (+ 1 2) 3) , then proceed as follows:例如,如果 AST 是(* (+ 1 2) 3) ,则执行以下操作:

expand[ (* (+ 1 2) 3); E ]  -->  E( E0( E1( expand[(+ 1 2) ; E1]
                                            '*'
                                            expand[3 ; E2] ) ) )

expand[ (+ 1 2) ; E1 ] --> E1(E2(E3( '('
                                     E0( expand[ 1 ; E1 ]
                                         '+'
                                         expand[ 2 ; E0 ] )
                                     ')' )))

...

The algorithm can of course be implemented in a much less explicit way, but the method can be used to guide an implementation without going insane :).该算法当然可以以不太明确的方式实现,但该方法可用于指导实现而不会发疯:)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM