In antlr, is there a way to get parsed text of a CommonTree in AST mode?

Question

A simple example:

(grammar):

stat: ID '=' expr NEWLINE -> ^('=' ID expr)

expr: atom '+' atom -> ^(+ atom atom)

atom: INT | ID

...

(input text): a = 3 + 5

The corresponding CommonTree for '3 + 5' contains a '+' token and two children (3, 5).

At this point, what is the best way to recover the original input text that parsed into this tree ('3 + 5')?

I've got the text, position and the line number of individual tokens in the CommonTree object, so theoretically it's possible to make sure only white space tokens are discarded and piece them together using this information, but it looks error prone.

Is there a better way to do this?

Answer 1

Is there a better way to do this?

Better, I don't know. There is another way, of course. You decide what's better.

Another option would be to create a custom AST node class (and corresponding node-adapter) and add the matched text to this AST node during parsing. The trick here is to not use skip() , which discards the token from the lexer, but to put it on the HIDDEN channel. This is effectively the same, however, the text these (hidden) tokens match are still available in the parser.

A quick demo: put all these 3 file in a directory named demo :

demo/Tg

grammar T;

options {
  output=AST;
  ASTLabelType=XTree;
}

@parser::header {
  package demo;
  import demo.*;
}

@lexer::header {
  package demo;
  import demo.*;
}

parse
 : expr EOF -> expr
 ;

expr
@after{$expr.tree.matched = $expr.text;}
 : Int '+' Int ';' -> ^('+' Int Int)
 ;

Int
 : '0'..'9'+
 ;

Space
 : ' ' {$channel=HIDDEN;}
 ;

demo/XTree.java

package demo;

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class XTree extends CommonTree {

  protected String matched;

  public XTree(Token t) {
    super(t);
    matched = null;
  }
}

demo/Main.java

package demo;

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {

  public static void main(String[] args) throws Exception {
    String source = "12    +  42 ;";
    TLexer lexer = new TLexer(new ANTLRStringStream(source));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    parser.setTreeAdaptor(new CommonTreeAdaptor(){
      @Override
      public Object create(Token t) {
        return new XTree(t);
      }
    }); 
    XTree root = (XTree)parser.parse().getTree();
    System.out.println("tree    : " + root.toStringTree());
    System.out.println("matched : " + root.matched);    
  }
}

You can run this demo by opening a shell and cd-ing to the directory that holds the demo directory and execute the following:

java -cp demo/antlr-3.3.jar org.antlr.Tool demo/T.g
javac -cp demo/antlr-3.3.jar demo/*.java
java -cp .:demo/antlr-3.3.jar demo.Main

which will produce the following output:

tree    : (+ 12 42)
matched : 12    +  42 ;

Answer 2

Another possibility is to use TokenRewriteStream which has several toString() methods.

To borrow from @Bart Kiers' example Demo/Main.java

TokenRewriteStream tokens = new TokenRewriteStream(lexer)
TParser parser = new TParser(tokens);
...
tokens.toString(n.getTokenStartIndex(), n.getTokenStopIndex() + 1).trim()

So given any node 'n' of your parse tree, calling toString() on it as above will produce the string that "generated" this node.

In antlr, is there a way to get parsed text of a CommonTree in AST mode?

Question

2 answers

solution1
3 ACCPTED 2012-10-17 20:27:28

demo/Tg

demo/XTree.java

demo/Main.java

solution2
0 2016-12-13 20:10:16

In antlr, is there a way to get parsed text of a CommonTree in AST mode?

Question

2 answers

solution1 3 ACCPTED 2012-10-17 20:27:28

demo/Tg

demo/XTree.java

demo/Main.java

solution2 0 2016-12-13 20:10:16

solution1
3 ACCPTED 2012-10-17 20:27:28

solution2
0 2016-12-13 20:10:16