[英]Parsing Decaf grammar in Antlr4
I am creating parser and lexer rules for Decaf programming language written in ANTLR4.我正在为用 ANTLR4 编写的 Decaf 编程语言创建解析器和词法分析器规则。 There is a parser test file I am trying to run to get the parser tree for it by printing the visited nodes on the terminal window and paste them into
D3_parser_tree.html
class.我试图运行一个解析器测试文件,通过在终端 window 上打印访问的节点并将它们粘贴到
D3_parser_tree.html
ZA2F2ED4F8EBC2CBB4C21A29DC40AB61 中来获取解析器树The current parser tree is missing the right square brackets with the number 10 according to this testing file: class program { int i [10]; }
根据此测试文件,当前解析器树缺少编号为 10 的右方括号:
class program { int i [10]; }
class program { int i [10]; }
The error I am getting: mismatched input '10' expecting INT_LITERAL
我得到的错误:
mismatched input '10' expecting INT_LITERAL
I am not sure why I am getting this error although I have declared a lexer rule for INT_LITERAL
and then called it in a parser rule within field_decl
according to the given Decaf spec:我不确定为什么会收到此错误,尽管我已经为
INT_LITERAL
声明了一个词法分析器规则,然后根据给定的 Decaf 规范在field_decl
内的解析器规则中调用它:
** Parser rules **
<program> → class Program ‘{‘ <field_decl>* <method_decl>* ‘}’
<field_decl> → <type> { <id> | <id> ‘[‘ <int_literal> ‘]’ }+, ;
<method_decl> → { <type> | void } <id> ( [ { <type> <id> }+, ] ) <block>
<digit> → 0 | 1 | 2 | … | 9
<block> → ‘{‘ <var_decl>* <statement>* ‘}’
<literal> → <int_literal> | <char_literal> | <bool_literal>
<hex_digit> → <digit> | a | b | c | … | f | A | B | C | … | F
<int_literal> → <decimal_literal> | <hex_literal>
<decimal_literal> → <digit> <digit>*
<hex_literal> → 0x <hex_digit> <hex_digit>*
Related Lexer rules:相关的词法分析器规则:
NUMBER : [0-9]+;
fragment ALPHA : [_a-zA-Z0-9];
fragment DIGIT : [0-9];
fragment DECIMAL_LITERAL : DIGIT+;
CHAR_LITERAL : '\'' CHAR '\'';
STRING_LITERAL : '"' CHAR+ '"' ;
COMMENT : '//' ~('\n')* '\n' -> skip;
WS : (' ' | '\n' | '\t' | '\r') + -> skip;
Related Parser rules:相关解析器规则:
program : CLASS VAR LCURLYBRACE field_decl*method_decl* RCURLYBRACE EOF;
field_decl : data_type field ( COMMA field )* SEMICOLON;
Please let me know if you need further details & I appreciate your help a lot.如果您需要更多详细信息,请告诉我,非常感谢您的帮助。
The following rules conflict:以下规则冲突:
VAR : ALPHA+;
...
NUMBER : [0-9]+;
...
INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;
They all match 10
, but the lexer will always choose VAR
since that is the rule defined first.它们都匹配
10
,但词法分析器将始终选择VAR
,因为这是首先定义的规则。
This is just how ANTLR's lexer works: it tries to match the most characters as possible, and when two (or more) rules all match the same amount of characters, the one defined first "wins".这正是 ANTLR 词法分析器的工作方式:它尝试匹配尽可能多的字符,并且当两个(或更多)规则都匹配相同数量的字符时,第一个定义的“获胜”。
You will see that it parses correctly if you change field
into:如果您将
field
更改为以下内容,您将看到它正确解析:
field : VAR | VAR LSQUAREBRACE VAR RSQUAREBRACE;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.