Infinite recursion in ANTLR grammar

Question

I writing a simple grammar to recognize some expressions. Here, I'm posting a simpler version of it, that I wrote just to simplify my explanation. This simpler version can recognize expressions like:

this is a text
[n]this is another text[/n]
[n][n]this is a compound expression[/n][/n]

My problem is when I sumbmit a expression like: [i]this should generate just a recognition exception[/n]

A recognition exception is thrown, but the parser enters in a infinte recursion, because it matches '[', but when it not matches 'i' it loses itself. I think that is happening because my text component of the grammar can not contain square brackets. So, I'm posting the grammar.

grammar ErrorTest;

expression
    :    rawText EOF
    |    command EOF
    ;

rawText
    :    word+
    ;

word
    :    ESPACE* TEXT ESPACE*
    ;

command 
    :    simpleCommand
    |    compoundCommand
    ;

simpleCommand
    :    HELP
    ;

compoundCommand
    :    rawText
    |    BEGIN compoundCommand END
    ;

HELP   : '[help]';

BEGIN  : '[n]';
END    : '[/n]';

ESPACE : ' ';
TEXT   : ~(' '|'['|']')*;

How can I solve it?

Answer 1

word matches the empty string because in

word
    :    ESPACE* TEXT ESPACE*
    ;

TEXT matches the empty string which causes

rawText
    :    word+
    ;

to loop infinitely.

Change

TEXT   : ~(' '|'['|']')*;

to

TEXT   : ~(' '|'['|']')+;

which will make your grammar only finitely ambiguous.

The way to think about this is that rawText can match the empty string in many ways

Zero TEXT tokens
One TEXT token with length 0.
Two TEXT tokens with length 0.
Three TEXT tokens with length 0.
...

This manifests when you have a syntactic error ( [i] ) because it tries each of these alternatives to see if any of them resolve the error.

To get rid of any quadratic behavior, you should really make it completely unambiguous.

rawText : ign (word (ign word)*)? ign;
ign     : ESPACE*;
word    : TEXT;

The problem with the naive fix is that rawText can match "foo" in several ways:

TEXT("foo")
TEXT("fo"), ESPACE(""), TEXT("o")
TEXT("f"), ESPACE(""), TEXT("oo")
TEXT("f"), ESPACE(""), TEXT("o"), ESPACE(""), TEXT("o")

Answer 2

Why not do something like this:

grammar Test;

expression
 : atom+ EOF
 ;

atom
 : TEXT
 | ESPACE
 | command
 ;

command 
 : simpleCommand
 | compoundCommand
 ;

simpleCommand
 : HELP
 ;

compoundCommand
 : BEGIN atom+ END
 ;

HELP   : '[help]';
BEGIN  : '[n]';
END    : '[/n]';
ESPACE : ' ';
TEXT   : ~(' '|'['|']')+;

which would oparse input like

this is [n][n]a [help][n]compound[/n] expression[/n][/n]

into the following parse tree:

(click image to enlarge)

Infinite recursion in ANTLR grammar

Question

2 answers

solution1
6 ACCPTED 2012-04-25 23:28:06

which will make your grammar only finitely ambiguous.

solution2
1 2012-04-26 07:09:58

Infinite recursion in ANTLR grammar

Question

2 answers

solution1 6 ACCPTED 2012-04-25 23:28:06

which will make your grammar only finitely ambiguous.

solution2 1 2012-04-26 07:09:58

solution1
6 ACCPTED 2012-04-25 23:28:06

solution2
1 2012-04-26 07:09:58