简体   繁体   English

在ANTLR中解析DECAF语法

[英]Parsing DECAF grammar in ANTLR

I am creating a the parser for DECAF with Antlr grammar DECAF ; 我正在使用Antlr语法DECAF创建DECAF的解析器;

//********* LEXER ******************
LETTER: ('a'..'z'|'A'..'Z') ;
DIGIT : '0'..'9' ;
ID : LETTER( LETTER | DIGIT)* ;
NUM: DIGIT(DIGIT)* ;
COMMENTS: '//' ~('\r' | '\n' )*  -> channel(HIDDEN);
WS : [ \t\r\n\f | ' '| '\r' | '\n' | '\t']+  ->channel(HIDDEN); 

CHAR: (LETTER|DIGIT|' '| '!' | '"' | '#' | '$' | '%' | '&' | '\'' | '(' | ')' | '*' | '+' 

| ',' | '-' | '.' | '/' | ':' | ';' | '<' | '=' | '>' | '?' | '@' | '[' | '\\' | ']' | '^' | '_' | '`'| '{' | '|' | '}' | '~' 
'\t'| '\n' | '\"' | '\'');


// ********** PARSER *****************

program : 'class' 'Program' '{' (declaration)* '}'  ;

declaration: structDeclaration| varDeclaration | methodDeclaration  ;

varDeclaration: varType ID ';' | varType ID '[' NUM ']' ';'  ;

structDeclaration : 'struct' ID '{' (varDeclaration)* '}'  ;

varType: 'int' | 'char' | 'boolean' | 'struct' ID | structDeclaration | 'void'  ;

methodDeclaration : methodType ID '(' (parameter (',' parameter)*)* ')' block  ;

methodType : 'int' | 'char' | 'boolean' | 'void' ;

parameter : parameterType ID | parameterType ID '[' ']' ;

parameterType: 'int' | 'char' | 'boolean'  ;

block : '{' (varDeclaration)* (statement)* '}' ;

statement : 'if' '(' expression ')' block ( 'else' block )? 
           | 'while' '(' expression ')' block
           |'return' expressionA ';' 
           | methodCall ';' 
           | block  
           | location '=' expression 
           | (expression)? ';'  ;

expressionA: expression | ;


location : (ID|ID '[' expression ']') ('.' location)?  ;

expression : location | methodCall | literal | expression op expression | '-' expression | '!' expression | '('expression')'  ;

methodCall :    ID '(' arg1 ')' ;

arg1    :   arg2 | ;

arg2    :   (arg) (',' arg)* ;

arg :   expression;

op: arith_op | rel_op | eq_op | cond_op  ;

arith_op : '+' | '-' | '*' | '/' | '%' ;

rel_op : '<' | '>' | '<=' | '>=' ;

eq_op : '==' | '!=' ;

cond_op : '&&' | '||' ;

literal : int_literal | char_literal | bool_literal ;

int_literal : NUM ;

char_literal : '\'' CHAR '\'' ;

bool_literal : 'true' | 'false' ;

When I give it the input: 当我输入时:

    class Program {

    void main(){

        return 3+5 ;
    }
    }

The parse tree is not building correctly since it is not recognizing the 3+5 as an expression. 解析树无法正确构建,因为它无法将3 + 5识别为表达式。 Is there anything wrong with my grammar that is causing the problem? 我的语法是否有引起问题的错误?

Lexer rules are matched from top to bottom. Lexer规则从上到下匹配。 When 2 or more lexer rules match the same amount of characters, the one defined first will win . 当2个或更多lexer规则匹配相同数量的字符时,首先定义的一个将获胜 Because of that, a single digit integer will get matched as a DIGIT instead of a NUM . 因此,一位整数将作为DIGIT而不是NUM匹配。

Try parsing the following instead: 尝试解析以下内容:

class Program {
    void main(){    
        return 33 + 55 ;
    }
}

which will be parsed just fine. 将被解析就好了。 This is because 33 and 55 are matched as NUM s, because NUM can now match 2 characters ( DIGIT only 1, so NUM wins ). 这是因为3355 匹配为NUM ,因为NUM现在可以匹配2个字符( DIGIT仅1个字符,所以NUM wins )。

To fix it, make DIGIT a fragment (and LETTER as well): 要修复此问题,请使DIGIT成为片段(以及LETTER ):

fragment LETTER: ('a'..'z'|'A'..'Z') ;
fragment DIGIT : '0'..'9' ;
ID : LETTER( LETTER | DIGIT)* ;
NUM: DIGIT(DIGIT)* ;

Lexer fragments are only used internally by other lexer rules, and will never become tokens of their own. Lexer片段仅由其他lexer规则在内部使用,并且永远不会成为其自身的标记。

A couple of other things: your WS rule matches way too much (it now also matches a | and a ' ), it should be: 还有两件事:您的WS规则匹配得太多(现在也匹配了|' ),应该是:

WS : [ \t\r\n\f]+  ->channel(HIDDEN);

and you shouldn't match a char literal in your parser: do it in the lexer: 而且您不应该在解析器中匹配char文字:在lexer中进行匹配:

CHAR : '\'' ( ~['\r\n\\] | '\\' ['\\] ) '\'';

If you don't, the following will not get parsed properly: 否则,将无法正确解析以下内容:

class Program {
    void main(){
        return '1';
    }
}

because the 1 wil be tokenized as a NUM and not as a CHAR . 因为1将被标记为NUM而不是CHAR

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM