简体   繁体   中英

How Antlr creates tokens

I have the following grammar and I would like to understand why the input "a" is not matched. If remove the parser_a the input is accepted.. Also if I remove the 'b' in the lexer rule A it also works..
I would be grateful if you can explain me the behavior.

grammar SmallTest;
options {
  language = Java;
}
@header {
  package test;
}
@lexer::header {
  package test;
}
start 
    : A EOF;
parser_a 
    : 'a' ;
A 
    : 'a' | 'b' ;

Heres my Java-code. Also, is it helpful if I post the code I used to test my grammar??

package test;
public class SmallTest {
    public static void main(String[] args) throws RecognitionException {
        CharStream stream = new ANTLRStringStream("a");
        SmallTestLexer lexer = new SmallTestLexer(stream);
        CommonTokenStream tokenStream = new CommonTokenStream(lexer);
        SmallTestParser parser = new SmallTestParser(tokenStream);
        parser.start();
        System.out.println("done");
    }
}

the A : 'a' | 'b' A : 'a' | 'b' is the lex rule, it will replace all 'a' and 'b' with the token 'A'

the rule parser_a : 'a' ; will then never work

what you should write instead is

start 
    : parser_a EOF;
parser_a 
    : A ;
A 
    : 'a' | 'b' ;

or simply

start 
    : A EOF;
A 
    : 'a' | 'b' ;

depending on what you want to do more.

so the general idea is to first to tokenize everything, then use the tokens in the parser rules. the above grammar combines the lexical and parser rules - maybe that is what is confusing you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM