简体繁体 English

可以用yacc生成Java 1的三个地址码吗？

[英]Can yacc be used to generate three address code for Java 1?

原文 2020-04-15 17:15:19 6 1 java/ compiler-construction/ yacc/ lalr/ intermediate-code

I have read that yacc generates bottom up parser for LALR(1) grammars.我读过 yacc 为 LALR(1) 语法生成自下而上的解析器。 I have a grammar for Java 1 that can be used for generating three address code and is strictly LALR(1), but the translation scheme I am employing makes it L-attributed.我有一个 Java 1 的语法，它可用于生成三个地址代码，严格来说是 LALR(1)，但我使用的翻译方案使其具有 L 属性。 Now I have read that L-attributed LR grammars cannot be translated during bottom up parsing.现在我已经读到 L 属性的 LR 语法在自下而上解析期间无法翻译。 So, can yacc be used here or not?那么，这里是否可以使用 yacc？ And if yes, how does yacc get around this problem?如果是，yacc 如何解决这个问题？

1 个解决方案

You're not going to get a good answer unless you ask a specific, detailed question.除非你问一个具体的、详细的问题，否则你不会得到一个好的答案。 Here's a vague sketch of an approach.这是一种方法的模糊草图。

Synthesized attributes are obviously not a problem for a bottom-up parser, since they are computed in the final reduction action for the corresponding terminal.对于自下而上的解析器来说，综合属性显然不是问题，因为它们是在相应终端的最终归约动作中计算的。 So the question comes down to "How can a bottom-up parser compute inherited attributes?"所以问题归结为“自下而上的解析器如何计算继承的属性？”

Since the grammar is L-attributed, we know that any inherited attribute is computed from the attributes of its left siblings.由于语法是 L 属性的，我们知道任何继承的属性都是从其左兄弟的属性计算出来的。 Yacc/bison allows actions to be inserted anywhere in a right-hand side, and these "Mid-Rule Actions" (MRAs) are executed as they are encountered. Yacc/bison 允许将动作插入右侧的任何位置，并且这些“中间规则动作”（MRA）在遇到时执行。 A MRA has available to it precisely its left-siblings, so which is all that is needed to compute an inherited attribute. MRA 正好可以使用它的左兄弟姐妹，因此这就是计算继承属性所需的全部内容。

However, that doesn't show how the attribute can actually be inherited.但是，这并没有显示该属性实际上是如何被继承的。 A MRA inserted just before a grammar symbol in some rule can certainly be used to partially compute an inherited attribute of that symbol, but an inherited attribute can also use synthesized attributes of the children.插入到某个规则中的语法符号之前的 MRA 当然可以用于部分计算该符号的继承属性，但继承属性也可以使用子项的综合属性。

To accomplish that, we need to do two things:为此，我们需要做两件事：

Insert a MRA just before the non-terminal, which gathers together the left-sibling attributes.在非终结符之前插入一个 MRA，它将左兄弟属性聚集在一起。 Since MRAs are also grammar symbols, this MRA will be the last left-sibling, in effect the youngest uncle of the terminal's children.由于 MRA 也是语法符号，因此该 MRA 将是最后一个左兄弟，实际上是终端孩子中最小的叔叔。 (You don't necessarily need a MRA; you can insert a "marker": a non-terminal whose only production is empty and whose action is the MRA body. But that's not so convenient because the action will have to get at the semantic values of the preceding grammar symbols. Or you could split the production into two pieces, so that both actions are final.) （您不一定需要 MRA；您可以插入一个“标记”：一个非终端，其唯一的生产是空的，其动作是 MRA 主体。但这不太方便，因为动作必须达到语义前面的语法符号的值。或者你可以把产生式分成两部分，这样两个动作都是最终的。）
Access the uncle's attributes in the terminal's reduction action.在终端的还原动作中访问叔叔的属性。
Bison/yacc allow the second step by letting you use a non-positibd symbol index to refer to slots in the parser stack. Bison/yacc 通过让您使用非 positibd 符号索引来引用解析器堆栈中的插槽来允许第二步。 In particular, $0 refers to the symbol immediately preceding the non-terminal in the parent production (what I called the uncle above).特别是， $0指的是紧接在父产生式中非终结符之前的符号（我在上面称之为叔叔）。 Of course, for that to work, you have to ensure that the uncle is the same non-terminal (or at least has the same semantic type) in every production in which the non-terminal appears.当然，要做到这一点，您必须确保 uncle 在出现非终结符的每个产生式中都是相同的非终结符（或至少具有相同的语义类型）。 This may require adding some markers.这可能需要添加一些标记。
Speaking of semantic values, you may be able to satisfy yourself that all the uncles of a given non-terminal are the same, or at least have the same type.说到语义值，你也许可以让自己满意，给定非终结符的所有叔叔都是相同的，或者至少具有相同的类型。 But bison does not do this analysis, so it can't warn you if you get it wrong.但是bison不做这个分析，所以如果你弄错了它也不会警告你。 Be careful, And as another consequence, you have to tell bison what the type is, so you can't just write $0 : you need $<tag>0 .小心，作为另一个结果，你必须告诉野牛类型是什么，所以你不能只写$0 ：你需要$<tag>0 。

Note:笔记：

It is not always possible to handle inherited attributes in an L-attributed LR grammar, because at the moment in which the non-terminal is encountered, the parser may not yet know that the non-terminal will in fact form part of the parse tree.在 L 属性的 LR 文法中处理继承的属性并不总是可能的，因为在遇到非终结符的那一刻，解析器可能还不知道非终结符实际上将构成分析树的一部分. This problem does not occur in LL grammars, because in LL parsing the parser can only predict a non-terminal which is guaranteed to be present in the parse if the rest of the input is valid.这个问题不会出现在 LL 语法中，因为在 LL 解析中，如果输入的 rest 有效，解析器只能预测一个非终结符，该非终结符保证存在于解析中。

Any LL grammar can be parsed bottom-up, so there is no problem with L-attributed LL grammars.任何 LL 文法都可以自下而上解析，因此 L 属性的 LL 文法没有问题。 But the bottom-up parser can do better than that;但是自下而上的解析器可以做得更好； it doesn't require that the full grammar be LL.它不需要完整的语法是 LL。 Only those decision points for non-terminals which are about to be assigned an inherited attribute need to be LL-deterministic.只有那些即将被分配继承属性的非终端的决策点需要是 LL-deterministic。

This restriction is enforced by the technique of placing a MRA or a marker immediately before the non-terminal.这种限制是通过在非终端紧接之前放置 MRA 或标记的技术来实施的。 In other words, adding a marker (or an MRA) at certain points of an LR grammar might invalidate the LR property.换句话说，在 LR 语法的某些点添加标记（或 MRA）可能会使 LR 属性无效。 There is a good discussion of this issue in the bison manual , so I won't elaborate on it here, except to observe one detail. 野牛手册中对这个问题有很好的讨论，所以我在这里不再详述，只观察一个细节。

The technique outlined above for propagating inherited attributes uses MRAs (or markers) at strategic points to hold the inherited attributes.上面概述的用于传播继承属性的技术在战略点使用 MRA（或标记）来保存继承属性。 These productions must be reduced in order to proceed with the parse, so as noted in the above-mentioned section of the bison manual it may be necessary to rearrange the grammar in order to remove conflicts.为了继续进行解析，必须减少这些产生式，因此如上面提到的野牛手册部分所述，可能需要重新排列语法以消除冲突。 In rare cases, this rewriting is not even possible.在极少数情况下，这种重写甚至是不可能的。

However, removing the conflict might still result in a grammar in which an inherited attribute is propagated in case some non-terminal needs the value, without any guarantee that the non-terminal will eventually be reduced.但是，删除冲突可能仍会导致在某些非终结符需要该值的情况下传播继承属性的语法，而不能保证最终将减少非终结符。 In this case, the inherited attribute will be needlessly computed and then later ignored.在这种情况下，继承的属性将被不必要地计算，然后被忽略。 But that shouldn't be a problem.但这应该不是问题。 Inherent in the concept of attributes is the idea that attributes are functional;属性概念中固有的概念是属性是功能性的。 in other words, that the computation is free of side-effects.换句话说，计算没有副作用。

The absence of side effects means that an attribute grammar parser should be free to evaluate attributes in any order which respects attribute dependency.没有副作用意味着属性语法解析器应该可以自由地以尊重属性依赖性的任何顺序评估属性。 In particular, this means that you can trivially achieve correct evaluation of attributes by turning the attribute computations into continuations, a technique sometimes referred to as lazy evaluation or "thunking".特别是，这意味着您可以通过将属性计算转换为延续来轻松实现属性的正确评估，这种技术有时被称为惰性评估或“thunking”。

But there is always the temptation to use MRAs precisely in order to perform side-effects.但是总是有人想精确地使用 MRA 来产生副作用。 One very common such side effect is printing three-address code to the output stream.一种非常常见的副作用是将三地址代码打印到 output stream。 Another one is mutating persistent data structures such as symbol tables.另一种是改变持久性数据结构，例如符号表。 That's no longer L-attributed parsing, and so the suggestions offered here might not work for such applications.这不再是 L 属性解析，因此此处提供的建议可能不适用于此类应用程序。