简体   繁体   中英

Infinite recursion in ANTLR grammar

I writing a simple grammar to recognize some expressions. Here, I'm posting a simpler version of it, that I wrote just to simplify my explanation. This simpler version can recognize expressions like:

  1. this is a text
  2. [n]this is another text[/n]
  3. [n][n]this is a compound expression[/n][/n]

My problem is when I sumbmit a expression like: [i]this should generate just a recognition exception[/n]

A recognition exception is thrown, but the parser enters in a infinte recursion, because it matches '[', but when it not matches 'i' it loses itself. I think that is happening because my text component of the grammar can not contain square brackets. So, I'm posting the grammar.

grammar ErrorTest;

expression
    :    rawText EOF
    |    command EOF
    ;

rawText
    :    word+
    ;

word
    :    ESPACE* TEXT ESPACE*
    ;

command 
    :    simpleCommand
    |    compoundCommand
    ;

simpleCommand
    :    HELP
    ;

compoundCommand
    :    rawText
    |    BEGIN compoundCommand END
    ;

HELP   : '[help]';

BEGIN  : '[n]';
END    : '[/n]';

ESPACE : ' ';
TEXT   : ~(' '|'['|']')*;

How can I solve it?

word matches the empty string because in

word
    :    ESPACE* TEXT ESPACE*
    ;

TEXT matches the empty string which causes

rawText
    :    word+
    ;

to loop infinitely.

Change

TEXT   : ~(' '|'['|']')*;

to

TEXT   : ~(' '|'['|']')+;

which will make your grammar only finitely ambiguous.

The way to think about this is that rawText can match the empty string in many ways

  1. Zero TEXT tokens
  2. One TEXT token with length 0.
  3. Two TEXT tokens with length 0.
  4. Three TEXT tokens with length 0.
  5. ...

This manifests when you have a syntactic error ( [i] ) because it tries each of these alternatives to see if any of them resolve the error.


To get rid of any quadratic behavior, you should really make it completely unambiguous.

rawText : ign (word (ign word)*)? ign;
ign     : ESPACE*;
word    : TEXT;

The problem with the naive fix is that rawText can match "foo" in several ways:

  1. TEXT("foo")
  2. TEXT("fo"), ESPACE(""), TEXT("o")
  3. TEXT("f"), ESPACE(""), TEXT("oo")
  4. TEXT("f"), ESPACE(""), TEXT("o"), ESPACE(""), TEXT("o")

Why not do something like this:

grammar Test;

expression
 : atom+ EOF
 ;

atom
 : TEXT
 | ESPACE
 | command
 ;

command 
 : simpleCommand
 | compoundCommand
 ;

simpleCommand
 : HELP
 ;

compoundCommand
 : BEGIN atom+ END
 ;

HELP   : '[help]';
BEGIN  : '[n]';
END    : '[/n]';
ESPACE : ' ';
TEXT   : ~(' '|'['|']')+;

which would oparse input like

this is [n][n]a [help][n]compound[/n] expression[/n][/n]

into the following parse tree:

在此输入图像描述

(click image to enlarge)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM