How Antlr creates tokens

Question

I have the following grammar and I would like to understand why the input "a" is not matched. If remove the parser_a the input is accepted.. Also if I remove the 'b' in the lexer rule A it also works..
I would be grateful if you can explain me the behavior.

grammar SmallTest;
options {
  language = Java;
}
@header {
  package test;
}
@lexer::header {
  package test;
}
start 
    : A EOF;
parser_a 
    : 'a' ;
A 
    : 'a' | 'b' ;

Heres my Java-code. Also, is it helpful if I post the code I used to test my grammar??

package test;
public class SmallTest {
    public static void main(String[] args) throws RecognitionException {
        CharStream stream = new ANTLRStringStream("a");
        SmallTestLexer lexer = new SmallTestLexer(stream);
        CommonTokenStream tokenStream = new CommonTokenStream(lexer);
        SmallTestParser parser = new SmallTestParser(tokenStream);
        parser.start();
        System.out.println("done");
    }
}

Answer 1

the A : 'a' | 'b' A : 'a' | 'b' is the lex rule, it will replace all 'a' and 'b' with the token 'A'

the rule parser_a : 'a' ; will then never work

what you should write instead is

start 
    : parser_a EOF;
parser_a 
    : A ;
A 
    : 'a' | 'b' ;

or simply

start 
    : A EOF;
A 
    : 'a' | 'b' ;

depending on what you want to do more.

so the general idea is to first to tokenize everything, then use the tokens in the parser rules. the above grammar combines the lexical and parser rules - maybe that is what is confusing you.

How Antlr creates tokens

Question

1 answers

solution1
1 2012-06-13 08:36:29

How Antlr creates tokens

Question

1 answers

solution1 1 2012-06-13 08:36:29

solution1
1 2012-06-13 08:36:29