简体   繁体   中英

antlr4 rule not ignoring standalone open bracket

The situation:

rule   : block+ ;
block  : '[' String ']' ;
String : ([a-z] | '[' | '\\]')+ ;

Trick is String can contain [ without backslash escape and ] with backslasash escape, so in this example:

[hello\]world][hello[[world]

First block can be parsed correctly, but the second one... parser is trying find ] for every [ . Is there way to say antlr parser to ignore this standalone [ ? I can't change format, but i need to find some workaround with antlr.

PS: Without antlr there is algorythm to avoid this, something like: collect [ in queue before we will find first ] and use only head of queue. But I really need antlr =_=

You can use Lexer modes.

Lexical modes allow us to split a single lexer grammar into multiple sublexers. The lexer can only return tokens matched by rules from the current mode.

You can read more about lexer rules in antlr documentation here .

First you will need to divide you grammar into separate lexer and parser . Than just use another mode after you see open bracket.

Parser grammar:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

rul   : block+ ;
block  : LBR STRING RBR ;

Lexer grammar:

lexer grammar TestLexer;

LBR: '[' -> pushMode(InString);

mode InString;

STRING : ([a-z] | '\\]' | '[')+ ;
RBR: ']' -> popMode;

Working example is here .

You can read the documentation on lexer modes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM