简体   繁体   中英

Parsing Decaf grammar in Antlr4

I am creating parser and lexer rules for Decaf programming language written in ANTLR4. There is a parser test file I am trying to run to get the parser tree for it by printing the visited nodes on the terminal window and paste them into D3_parser_tree.html class.The current parser tree is missing the right square brackets with the number 10 according to this testing file: class program { int i [10]; } class program { int i [10]; } 解析器树

The error I am getting: mismatched input '10' expecting INT_LITERAL

I am not sure why I am getting this error although I have declared a lexer rule for INT_LITERAL and then called it in a parser rule within field_decl according to the given Decaf spec:

** Parser rules **

<program> → class Program ‘{‘ <field_decl>* <method_decl>* ‘}’
<field_decl> → <type> { <id> | <id> ‘[‘ <int_literal> ‘]’ }+, ;
<method_decl> → { <type> | void } <id> ( [ { <type> <id> }+, ] ) <block>
<digit> → 0 | 1 | 2 | … | 9
<block> → ‘{‘ <var_decl>* <statement>* ‘}’
<literal> → <int_literal> | <char_literal> | <bool_literal>
<hex_digit> → <digit> | a | b | c | … | f | A | B | C | … | F
<int_literal> → <decimal_literal> | <hex_literal>
<decimal_literal> → <digit> <digit>*
<hex_literal> → 0x <hex_digit> <hex_digit>*

Related Lexer rules:

NUMBER : [0-9]+;
fragment ALPHA : [_a-zA-Z0-9];
fragment DIGIT : [0-9];
fragment DECIMAL_LITERAL : DIGIT+;
CHAR_LITERAL : '\'' CHAR '\'';
STRING_LITERAL : '"' CHAR+ '"' ;
COMMENT : '//' ~('\n')* '\n' -> skip;
WS : (' ' | '\n' | '\t' | '\r') + -> skip;

Related Parser rules:

program : CLASS VAR LCURLYBRACE field_decl*method_decl* RCURLYBRACE EOF;
field_decl : data_type field ( COMMA field )* SEMICOLON;

Please let me know if you need further details & I appreciate your help a lot.

The following rules conflict:

VAR : ALPHA+;
...
NUMBER : [0-9]+;
...
INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;

They all match 10 , but the lexer will always choose VAR since that is the rule defined first.

This is just how ANTLR's lexer works: it tries to match the most characters as possible, and when two (or more) rules all match the same amount of characters, the one defined first "wins".

You will see that it parses correctly if you change field into:

field : VAR | VAR LSQUAREBRACE VAR RSQUAREBRACE;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM