简体   繁体   English

语法和运算符关联性之间的关系

[英]Relation between grammar and operator associativity

Some compiler books / articles / papers talk about design of a grammar and the relation of its operator's associativity.一些编译器书籍/文章/论文讨论了语法的设计及其运算符关联性的关系。 I'm a big fan of top-down, especially recursive descent, parsers and so far most (if not all) compilers I've written use the following expression grammar:我是自上而下的忠实粉丝,尤其是递归下降、解析器和迄今为止我编写的大多数(如果不是全部)编译器都使用以下表达式语法:

Expr   ::= Term { ( "+" | "-" ) Term }
Term   ::= Factor { ( "*" | "/" ) Factor }
Factor ::= INTEGER | "(" Expr ")"

which is an EBNF representation of this BNF:这是此 BNF 的 EBNF 表示:

Expr  ::= Term Expr'
Expr' ::= ( "+" | "-" ) Term Expr' | ε
Term  ::= Factor Term'
Term' ::= ( "*" | "/" ) Factor Term' | ε
Factor = INTEGER | "(" Expr ")"

According to what I read, some regards this grammar as being "wrong" due to the change of operator associativity (left to right for those 4 operators) proven by the growing parse tree to the right instead of left.根据我读到的内容,一些人认为这种语法是“错误的”,因为运算符关联性的变化(这 4 个运算符从左到右)由向右而不是向左生长的解析树证明。 For a parser implemented through attribute grammar, this might be true as l-attribute value requires that this value created first then passed to child nodes.对于通过属性语法实现的解析器,这可能是正确的,因为 l 属性值要求首先创建此值,然后将其传递给子节点。 however, when implementing with normal recursive descent parser, it's up to me whether to construct this node first then pass to child nodes (top-down) or let child nodes be created first then add the returned value as the children of this node (passed in this node's constructor) (bottom-up).但是,当使用正常的递归下降解析器实现时,由我决定是先构造该节点然后传递给子节点(自上而下)还是先创建子节点然后将返回值添加为该节点的子节点(传递在此节点的构造函数中)(自下而上)。 There should be something I miss here because I don't agree with the statement saying this grammar is "wrong" and this grammar has been used in many languages esp.这里应该有一些我想念的东西,因为我不同意说这种语法是“错误的”的说法,而且这种语法已经在许多语言中使用,尤其是。 Wirthian ones.威斯安的。 Usually (or all?) the reading that says it promotes LR parsing instead of LL.通常(或全部?)说它促进 LR 解析而不是 LL 的阅读。

I think the issue here is that a language has an abstract syntax which is just like:我认为这里的问题是一种语言具有抽象语法,就像:

E ::= E + E | E - E | E * E | E / E | Int | (E)

but this is actually implemented via a concrete syntax which is used to specify associativity and precedence.但这实际上是通过用于指定关联性和优先级的具体语法实现的。 So, if you're writing a recursive decent parse, you're implicitly writing the concrete syntax into it as you go along and that's fine, though it may be good to specify it exactly as a phrase-structured grammar as well!因此,如果您正在编写一个像样的递归解析,那么当您使用 go 时,您会隐式地将具体语法写入其中,这很好,尽管将其完全指定为短语结构语法也可能会更好!

There are a couple of issues with your grammar if it is to be a fully-fledged concrete grammar.如果要成为成熟的具体语法,您的语法有几个问题。 First of all, you need to add productions to just 'go to the next level down', so relaxing your syntax a bit:首先,您需要添加产生式以“进入下一个级别”,因此请稍微放松一下语法:

Expr ::= Term + Term | Term - Term | Term
Term ::= Factor * Factor | Factor / Factor | Factor
Factor ::= INTEGER | (Expr)

Otherwise there's no way to derive valid sentences starting from the start symbol (in this case Expr).否则,无法从开始符号(在本例中为 Expr)导出有效句子。 For example, how would you derive '1 * 2' without those extra productions?例如,如果没有这些额外的产生,你将如何推导出“1 * 2”?

Expr -> Term
     -> Factor * Factor
     -> 1 * Factor
     -> 1 * 2

We can see the other grammar handles this in a slightly different way:我们可以看到其他语法以稍微不同的方式处理这个问题:

Expr -> Term Expr'
     -> Factor Term' Expr'
     -> 1 Term' Expr'
     -> 1 * Factor Term' Expr'
     -> 1 * 2 Term' Expr'
     -> 1 * 2 ε Expr'
     -> 1 * 2 ε ε
      = 1 * 2

but this achieves the same effect.但这达到了相同的效果。

Your parser is actually non-associative.您的解析器实际上是非关联的。 To see this ask how E + E + E would be parsed and find that it couldn't.要查看此问题,请询问如何解析E + E + E并发现它不能。 Whichever + is consumed first, we get E on one side and E + E on the other, but then we're trying to parse E + E as a Term which is not possible.无论哪个+首先被消耗,我们在一侧得到E ,在另一侧得到E + E ,但是我们试图将E + E解析为不可能的Term Equivalently, think about deriving that expression from the start symbol, again not possible.等效地,考虑从开始符号派生该表达式,这也是不可能的。

Expr -> Term + Term
     -> ? (can't get another + in here)

The other grammar is left-associative ebcase an arbitrarily long sting of E + E +... + E can be derived.另一种语法是左结合的,可以推导出任意长的E + E +... + E字符串。

So anyway, to sum up, you're right that when writing the RDP, you can implement whatever concrete version of the abstract syntax you like and you probably know a lot more about that than me.所以无论如何,总而言之,你是对的,在编写 RDP 时,你可以实现任何你喜欢的抽象语法的具体版本,而且你可能比我更了解这一点。 But there are these issues when trying to produce the grammar which describes your RDP precisely.但是在尝试生成准确描述您的 RDP 的语法时会出现这些问题。 Hope that helps!希望有帮助!

To get associative trees, you really need to have the trees formed with the operator as the subtree root node, with children having similar roots.要获得关联树,您确实需要使用运算符作为子树根节点形成树,并且子树具有相似的根。

Your implementation grammar:您的实现语法:

Expr  ::= Term Expr'
Expr' ::= ( "+" | "-" ) Term Expr' | ε
Term  ::= Factor Term'
Term' ::= ( "*" | "/" ) Factor Term' | ε
Factor ::= INTEGER | "(" Expr ")"

must make that awkward;一定会让这很尴尬; if you implement recursive descent on this, the Expr' routine has no access to the "left child" and so can't build the tree.如果您对此实现递归下降,则 Expr 例程无法访问“左孩子”,因此无法构建树。 You can always patch this up by passing around pieces (in this case, passing tree parts up the recursion) but that just seems awkward.您总是可以通过传递碎片来修补它(在这种情况下,将树的部分传递给递归),但这看起来很尴尬。 You could have chosen this instead as a grammar:您可以选择它作为语法:

Expr  ::= Term  ( ("+"|"-") Term )*;
Term  ::= Factor ( ( "*" | "/" ) Factor )* ;
Factor ::= INTEGER | "(" Expr ")"

which is just as easy (easier?) to code recursive descent-wise, but now you can form the trees you need without trouble.这同样容易(更容易?)以递归方式编写代码,但现在您可以轻松形成所需的树。

This doesn't really get you associativity;这并不能真正让您获得关联性。 it just shapes the trees so that it could be allowed.它只是塑造树木,以便它可以被允许。 Associativity means that the tree ( + (+ ab) c) means the same thing as (+ a (+ bc));关联性意味着树 (+ (+ ab) c) 与 (+ a (+ bc)) 的意思相同; its actually a semantic property (sure doesn't work for "-" but the grammar as posed can't distinguish).它实际上是一个语义属性(肯定不适用于“-”,但所提出的语法无法区分)。

We have a tool (the DMS Software Reengineering Toolkit ) that includes parsers and term-rewriting (using source-to-source transformations) in which the associativity is explicitly expressed.我们有一个工具( DMS Software Reengineering Toolkit ),其中包括解析器术语重写(使用源到源转换),其中明确表达了关联性。 We'd write your grammar:我们会写下你的语法:

Expr  ::= Term ;
[Associative Commutative] Expr  ::= Expr "+" Term ;
Expr  ::= Expr "-" Term ;
Term  ::= Factor ;
[Associative Commutative] Term  ::= Term "*" Factor ;
Term  ::= Term "/" Factor ;
Factor ::= INTEGER ;
Factor ::= "(" Expr ")" ;

The grammar seems longer and clumsier this way, but it in fact allows us to break out the special cases and mark them as needed.这样语法看起来更长更笨拙,但实际上它允许我们分解特殊情况并根据需要标记它们。 In particular, we can now distinguish operators that are associative from those that are not, and mark them accordingly.特别是,我们现在可以区分具有关联性和不具有关联性的运算符,并相应地标记它们。 With that semantic marking, our tree-rewrite engine automatically accounts for associativity and commutativity.使用该语义标记,我们的树重写引擎会自动考虑关联性和交换性。 You can see a full example of such DMS rules being used to symbolically simplify high-school algebra using explicit rewrite rules over a typical expression grammar that don't have to account for such semantic properties.您可以看到此类 DMS 规则的完整示例,该示例使用无需考虑此类语义属性的典型表达式语法的显式重写规则来象征性地简化高中代数 That is built into the rewrite engine.这是内置在重写引擎中的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM