简体   繁体   English

如何在ANTLR3树解析器@init动作中获取行号

[英]How to get line number in ANTLR3 tree-parser @init action

In ANTLR, version 3, how can the line number be obtained in the @init action of a high-level tree-parser rule? 在ANTLR版本3中,如何在高级树解析器规则的@init操作中获取行号?

For example, in the @init action below, I'd like to push the line number along with the sentence text. 例如,在下面的@init操作中,我想将行号与句子文本一起推送。

sentence
    @init { myNodeVisitor.pushScriptContext( new MyScriptContext( $sentence.text )); }
    : assignCommand 
    | actionCommand;
    finally {
        m_nodeVisitor.popScriptContext();
    }

I need to push the context before the execution of the actions associated with symbols in the rules. 我需要在执行与规则中的符号相关联的操作之前推送上下文。

Some things that don't work: 有些事情工作:

  • Using $sentence.line -- it's not defined, even though $sentence.text is. 使用$sentence.line - 它没有定义,即使$sentence.text是。
  • Moving the paraphrase push into the rule actions. 将释义推送到规则操作中。 Placed before the rule, no token in the rule is available. 放置在规则之前,规则中没有令牌可用。 Placed after the rule, the action happens after actions associated with the rule symbols. 放置在规则之后,操作发生在与规则符号关联的操作之后。
  • Using this expression in the @init action, which compiles but returns the value 0: getTreeNodeStream().getTreeAdaptor().getToken( $sentence.start ).getLine() . 在@init操作中使用此表达式,该操作编译但返回值0: getTreeNodeStream().getTreeAdaptor().getToken( $sentence.start ).getLine() EDIT: Actually, this does work, if $sentence.start is either a real token or an imaginary with a reference -- see Bart Kiers answer below. 编辑: 实际上,这确实有效,如果$ sentence.start要么是真实的令牌,要么带有参考的虚构 - 请参阅下面的Bart Kiers答案。

It seems like if I can easily get, in the @init rule, the matched text and the first matched token, there should be an easy way to get the line number as well. 似乎我可以很容易地在@init规则中获得匹配的文本和第一个匹配的标记,因此应该有一种简单的方法来获取行号。

You can look 1 step ahead in the token/tree-stream of a tree grammar using the following: CommonTree ahead = (CommonTree)input.LT(1) , which you can place in the @init section. 您可以使用以下内容在树语法的令牌/树流中向前看1步: CommonTree ahead = (CommonTree)input.LT(1) ,您可以将其@init部分中。

Every CommonTree (the default Tree implementation in ANTLR) has a getToken() method which return the Token associated with this tree. 每个CommonTree (ANTLR中的默认Tree实现)都有一个getToken()方法,该方法返回与此树关联的Token And each Token has a getLine() method, which, not surprisingly, returns the line number of this token. 并且每个Token都有一个getLine()方法,毫不奇怪,它返回此令牌的行号。

So, if you do the following: 因此,如果您执行以下操作:

sentence
@init {
  CommonTree ahead = (CommonTree)input.LT(1);
  int line = ahead.getToken().getLine();
  System.out.println("line=" + line);
}
  :  assignCommand 
  |  actionCommand
  ;

you should be able to see some correct line numbers being printed. 你应该能够看到正在打印一些正确的行号。 I say some , because this won't go as planned in all cases. 我说一些 ,因为在所有情况下都不会按计划进行。 Let me demonstrate using a simple example grammar: 让我演示使用一个简单的示例语法:

grammar ASTDemo;

options { 
  output=AST;
}

tokens {
  ROOT;
  ACTION;
}

parse
  :  sentence+ EOF -> ^(ROOT sentence+)
  ;

sentence
  :  assignCommand 
  |  actionCommand
  ;

assignCommand
  :  ID ASSIGN NUMBER -> ^(ASSIGN ID NUMBER)
  ;

actionCommand
  :  action ID -> ^(ACTION action ID)
  ;

action
  :  START
  |  STOP
  ;

ASSIGN : '=';
START  : 'start';
STOP   : 'stop';
ID     : ('a'..'z' | 'A'..'Z')+;
NUMBER : '0'..'9'+;
SPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();};

whose tree grammar looks like: 其树语法如下:

tree grammar ASTDemoWalker;

options {
  output=AST;
  tokenVocab=ASTDemo;
  ASTLabelType=CommonTree;
}


walk
  :  ^(ROOT sentence+)
  ;

sentence
@init {
  CommonTree ahead = (CommonTree)input.LT(1);
  int line = ahead.getToken().getLine();
  System.out.println("line=" + line);
}
  :  assignCommand 
  |  actionCommand
  ;

assignCommand
  :  ^(ASSIGN ID NUMBER)
  ;

actionCommand
  :  ^(ACTION action ID)
  ;

action
  :  START
  |  STOP
  ;

And if you run the following test class: 如果您运行以下测试类:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String src = "\n\n\nABC = 123\n\nstart ABC";
    ASTDemoLexer lexer = new ASTDemoLexer(new ANTLRStringStream(src));
    ASTDemoParser parser = new ASTDemoParser(new CommonTokenStream(lexer));
    CommonTree root = (CommonTree)parser.parse().getTree();
    ASTDemoWalker walker = new ASTDemoWalker(new CommonTreeNodeStream(root));
    walker.walk();
  }
}

you will see the following being printed: 你会看到以下内容被打印出来:

line=4
line=0

As you can see, "ABC = 123" produced the expected output (line 4), but "start ABC" didn't (line 0). 如您所见, "ABC = 123"产生预期输出(第4行),但"start ABC"没有产生(第0行)。 This is because the root of the action rule is a ACTION token and this token is never defined in the lexer, only in the tokens{...} block. 这是因为action规则的根是一个ACTION标记,并且该标记永远不会在词法分析器中定义,只能在tokens{...}块中定义。 And because it doesn't really exist in the input, by default the line 0 is attached to it. 并且因为输入中并不存在,所以默认情况下会将0行附加到输入中。 If you want to change the line number, you need to provide a "reference" token as a parameter to this so called imaginary ACTION token which it uses to copy attributes into itself. 如果要更改行号,则需要提供一个“引用”标记作为此所谓的虚构 ACTION标记的参数,该标记用于将属性复制到自身中。

So, if you change the actionCommand rule in the combined grammar into: 因此,如果您将组合语法中的actionCommand规则更改为:

actionCommand
  :  ref=action ID -> ^(ACTION[$ref.start] action ID)
  ;

the line number would be as expected (line 6). 行号将如预期的那样(第6行)。

Note that every parser rule has a start and end attribute which is a reference to the first and last token, respectively. 请注意,每个解析器规则都有一个startend属性,分别是对第一个和最后一个令牌的引用。 If action was a lexer rule (say FOO ), then you could have omitted the .start from it: 如果action是lexer规则(比如FOO ),那么你可以省略它的.start

actionCommand
  :  ref=FOO ID -> ^(ACTION[$ref] action ID)
  ;

Now the ACTION token has copied all attributes from whatever $ref is pointing to, except the type of the token, which is of course int ACTION . 现在, ACTION令牌已经复制了$ref指向的所有属性,除了令牌的类型,当然是int ACTION But this also means that it copied the text attribute, so in my example, the AST created by ref=action ID -> ^(ACTION[$ref.start] action ID) could look like: 但这也意味着它复制了text属性,所以在我的例子中,由ref=action ID -> ^(ACTION[$ref.start] action ID)创建的AST可能如下所示:

            [text=START,type=ACTION]
                  /         \
                 /           \
                /             \
   [text=START,type=START]  [text=ABC,type=ID]

Of course, it's a proper AST because the types of the nodes are unique, but it makes debugging confusing since ACTION and START share the same .text attribute. 当然,它是一个合适的AST,因为节点的类型是唯一的,但它使调试混乱,因为ACTIONSTART共享相同的.text属性。

You can copy all attributes to an imaginary token except the .text and .type by providing a second string parameter, like this: 您可以通过提供第二个字符串参数将所有属性复制到除.text.type之外的虚构标记,如下所示:

actionCommand
  :  ref=action ID -> ^(ACTION[$ref.start, "Action"] action ID)
  ;

And if you now run the same test class again, you will see the following printed: 如果您现在再次运行相同的测试类,您将看到以下内容:

line=4
line=6

And if you inspect the tree that is generated, it'll look like this: 如果你检查生成的树,它将如下所示:

[type=ROOT, text='ROOT']
  [type=ASSIGN, text='=']
    [type=ID, text='ABC']
    [type=NUMBER, text='123']
  [type=ACTION, text='Action']
    [type=START, text='start']
    [type=ID, text='ABC']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM