antlr grammar definition

Question

I am relatively new to compilers theory and I just wanted to create a grammar to parse some comparisons in order to evaluate them later. I found antlr which is a powerful tool to specify the grammar. From what I have learned in the theory I know that operators with higher precedence must be declard in deeper levels than operators with lower precedence. Additionally if I want some rule to be left associative I know that I have to set the recursivity to the left of the rule. Knowing that I have created a basic grammar to use &&, ||, !=, ==, <, >, <=, >=, (,) and !

start
 : orExpr
 ;

orExpr
 : orExpr OR andExpr
 | andExpr
 ;

andExpr
 : andExpr AND eqNotEqExpr
 | eqNotEqExpr
 ;

eqNotEqExpr
 : eqNotEqExpr NEQ compExpr
 | eqNotEqExpr EQ compExpr
 | compExpr
 ;

compExpr
 : compExpr LT compExpr
 | compExpr GT compExpr
 | compExpr LTEQ compExpr
 | compExpr GTEQ compExpr
 | notExpr
 ;

notExpr
 : NOT notExpr
 | parExpr
 ;

parExpr
 : OPAR orExpr CPAR
 | id
 ;

id
 : INT
 | FLOAT
 | TRUE
 | FALSE
 | ID
 | STRING
 | NULL
 ;

However searching in internet I have found a different way to specify above grammar which does not follow the above rules I mentioned regarding operator precedence and left associativity:

start
 : expr
 ;

expr
 : NOT expr                             //notExpr
 | expr op=(LTEQ | GTEQ | LT | GT) expr //relationalExpr
 | expr op=(EQ | NEQ) expr              //equalityExpr
 | expr AND expr                        //andExpr
 | expr OR expr                         //orExpr
 | atom                                 //atomExpr
 ;

atom
 : OPAR expr CPAR //parExpr
 | (INT | FLOAT)  //numberAtom
 | (TRUE | FALSE) //booleanAtom
 | ID             //idAtom
 | STRING         //stringAtom
 | NULL           //nullAtom
 ;

Can someone explain why this way of definig the grammar also works? Is it because some specific treatment of antlr or another type of grammar definition?

Below there are the operators and ids defined for the grammar:

OR : '||';
AND : '&&';
EQ : '==';
NEQ : '!=';
GT : '>';
LT : '<';
GTEQ : '>=';
LTEQ : '<=';
NOT : '!';

OPAR : '(';
CPAR : ')';

TRUE : 'true';
FALSE : 'false';
NULL : 'null';

ID
 : [a-zA-Z_] [a-zA-Z_0-9]*
 ;

INT
 : [0-9]+
 ;

FLOAT
 : [0-9]+ '.' [0-9]* 
 | '.' [0-9]+
 ;

STRING
 : '"' (~["\r\n] | '""')* '"'
 ;

COMMENT
 : '//' ~[\r\n]* -> skip
 ;

SPACE
 : [ \t\r\n] -> skip
 ;

OTHER
 : . 
 ;

Answer 1

This is specific to ANTLR v4.

Under the hood, a rule like this one will be rewritten to something equivalent to what you have done manually as part of the left-recursion elimination step. ANTLR does this as a convenience because LL grammars cannot contain left-recursive rules, as a direct conversion of such a rule into parser code would produce an infinite recursion in code (a function which unconditionnally calls itself).

There is more info and a transformation example in the docs page about left-recursion .

antlr grammar definition

Question

1 answers

solution1
0 ACCPTED 2017-10-10 11:15:02

antlr grammar definition

Question

1 answers

solution1 0 ACCPTED 2017-10-10 11:15:02

solution1
0 ACCPTED 2017-10-10 11:15:02