简体   繁体   中英

antlr grammar definition

I am relatively new to compilers theory and I just wanted to create a grammar to parse some comparisons in order to evaluate them later. I found antlr which is a powerful tool to specify the grammar. From what I have learned in the theory I know that operators with higher precedence must be declard in deeper levels than operators with lower precedence. Additionally if I want some rule to be left associative I know that I have to set the recursivity to the left of the rule. Knowing that I have created a basic grammar to use &&, ||, !=, ==, <, >, <=, >=, (,) and !

start
 : orExpr
 ;

orExpr
 : orExpr OR andExpr
 | andExpr
 ;

andExpr
 : andExpr AND eqNotEqExpr
 | eqNotEqExpr
 ;

eqNotEqExpr
 : eqNotEqExpr NEQ compExpr
 | eqNotEqExpr EQ compExpr
 | compExpr
 ;

compExpr
 : compExpr LT compExpr
 | compExpr GT compExpr
 | compExpr LTEQ compExpr
 | compExpr GTEQ compExpr
 | notExpr
 ;

notExpr
 : NOT notExpr
 | parExpr
 ;

parExpr
 : OPAR orExpr CPAR
 | id
 ;

id
 : INT
 | FLOAT
 | TRUE
 | FALSE
 | ID
 | STRING
 | NULL
 ;

However searching in internet I have found a different way to specify above grammar which does not follow the above rules I mentioned regarding operator precedence and left associativity:

start
 : expr
 ;

expr
 : NOT expr                             //notExpr
 | expr op=(LTEQ | GTEQ | LT | GT) expr //relationalExpr
 | expr op=(EQ | NEQ) expr              //equalityExpr
 | expr AND expr                        //andExpr
 | expr OR expr                         //orExpr
 | atom                                 //atomExpr
 ;

atom
 : OPAR expr CPAR //parExpr
 | (INT | FLOAT)  //numberAtom
 | (TRUE | FALSE) //booleanAtom
 | ID             //idAtom
 | STRING         //stringAtom
 | NULL           //nullAtom
 ;

Can someone explain why this way of definig the grammar also works? Is it because some specific treatment of antlr or another type of grammar definition?

Below there are the operators and ids defined for the grammar:

OR : '||';
AND : '&&';
EQ : '==';
NEQ : '!=';
GT : '>';
LT : '<';
GTEQ : '>=';
LTEQ : '<=';
NOT : '!';

OPAR : '(';
CPAR : ')';

TRUE : 'true';
FALSE : 'false';
NULL : 'null';

ID
 : [a-zA-Z_] [a-zA-Z_0-9]*
 ;

INT
 : [0-9]+
 ;

FLOAT
 : [0-9]+ '.' [0-9]* 
 | '.' [0-9]+
 ;

STRING
 : '"' (~["\r\n] | '""')* '"'
 ;

COMMENT
 : '//' ~[\r\n]* -> skip
 ;

SPACE
 : [ \t\r\n] -> skip
 ;

OTHER
 : . 
 ;

This is specific to ANTLR v4.

Under the hood, a rule like this one will be rewritten to something equivalent to what you have done manually as part of the left-recursion elimination step. ANTLR does this as a convenience because LL grammars cannot contain left-recursive rules, as a direct conversion of such a rule into parser code would produce an infinite recursion in code (a function which unconditionnally calls itself).

There is more info and a transformation example in the docs page about left-recursion .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM