简体   繁体   English

在 Antlr4 中解析 Decaf 语法

[英]Parsing Decaf grammar in Antlr4

I am creating parser and lexer rules for Decaf programming language written in ANTLR4.我正在为用 ANTLR4 编写的 Decaf 编程语言创建解析器和词法分析器规则。 There is a parser test file I am trying to run to get the parser tree for it by printing the visited nodes on the terminal window and paste them into D3_parser_tree.html class.我试图运行一个解析器测试文件,通过在终端 window 上打印访问的节点并将它们粘贴到D3_parser_tree.html ZA2F2ED4F8EBC2CBB4C21A29DC40AB61 中来获取解析器树The current parser tree is missing the right square brackets with the number 10 according to this testing file: class program { int i [10]; }根据此测试文件,当前解析器树缺少编号为 10 的右方括号: class program { int i [10]; } class program { int i [10]; } 解析器树

The error I am getting: mismatched input '10' expecting INT_LITERAL我得到的错误: mismatched input '10' expecting INT_LITERAL

I am not sure why I am getting this error although I have declared a lexer rule for INT_LITERAL and then called it in a parser rule within field_decl according to the given Decaf spec:我不确定为什么会收到此错误,尽管我已经为INT_LITERAL声明了一个词法分析器规则,然后根据给定的 Decaf 规范在field_decl内的解析器规则中调用它:

** Parser rules **

<program> → class Program ‘{‘ <field_decl>* <method_decl>* ‘}’
<field_decl> → <type> { <id> | <id> ‘[‘ <int_literal> ‘]’ }+, ;
<method_decl> → { <type> | void } <id> ( [ { <type> <id> }+, ] ) <block>
<digit> → 0 | 1 | 2 | … | 9
<block> → ‘{‘ <var_decl>* <statement>* ‘}’
<literal> → <int_literal> | <char_literal> | <bool_literal>
<hex_digit> → <digit> | a | b | c | … | f | A | B | C | … | F
<int_literal> → <decimal_literal> | <hex_literal>
<decimal_literal> → <digit> <digit>*
<hex_literal> → 0x <hex_digit> <hex_digit>*

Related Lexer rules:相关的词法分析器规则:

NUMBER : [0-9]+;
fragment ALPHA : [_a-zA-Z0-9];
fragment DIGIT : [0-9];
fragment DECIMAL_LITERAL : DIGIT+;
CHAR_LITERAL : '\'' CHAR '\'';
STRING_LITERAL : '"' CHAR+ '"' ;
COMMENT : '//' ~('\n')* '\n' -> skip;
WS : (' ' | '\n' | '\t' | '\r') + -> skip;

Related Parser rules:相关解析器规则:

program : CLASS VAR LCURLYBRACE field_decl*method_decl* RCURLYBRACE EOF;
field_decl : data_type field ( COMMA field )* SEMICOLON;

Please let me know if you need further details & I appreciate your help a lot.如果您需要更多详细信息,请告诉我,非常感谢您的帮助。

The following rules conflict:以下规则冲突:

VAR : ALPHA+;
...
NUMBER : [0-9]+;
...
INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;

They all match 10 , but the lexer will always choose VAR since that is the rule defined first.它们都匹配10 ,但词法分析器将始终选择VAR ,因为这是首先定义的规则。

This is just how ANTLR's lexer works: it tries to match the most characters as possible, and when two (or more) rules all match the same amount of characters, the one defined first "wins".这正是 ANTLR 词法分析器的工作方式:它尝试匹配尽可能多的字符,并且当两个(或更多)规则都匹配相同数量的字符时,第一个定义的“获胜”。

You will see that it parses correctly if you change field into:如果您将field更改为以下内容,您将看到它正确解析:

field : VAR | VAR LSQUAREBRACE VAR RSQUAREBRACE;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM