简体   繁体   中英

Antlr syntactic predicate - mismatched character

I have the following grammar:

SPACE : (' '|'\t'|'\n'|'\r')+ {$channel = HIDDEN;};
NAME_TAG : 'name';
IS_TAG : 'is';

START : 'START';
END : ('END START') => 'END START'  ;

WORD    : 'A'..'Z'+;

rule :  START NAME_TAG IS_TAG WORD END;

and want to parse languages like: "START name is END END START". The problem here is the END-token, because the 'END ' (Word + SPACE) is misinterpreted. I thought the correct approach here would be with the syntactic predicate (END-token) but maybe I am wrong.

I'd not create tokens that are 2 (or more) WORD s separated by spaces. Why not tokenize 'END' as and END -token and then do something like this:

rule     : START NAME_TAG IS_TAG word END START;
word     : WORD | END; // expand this rule, as you see fit
NAME_TAG : 'name';
IS_TAG   : 'is';
START    : 'START';
END      : 'END';
WORD     : 'A'..'Z'+;
SPACE    : (' '|'\t'|'\n'|'\r')+ {$channel = HIDDEN;};

which would parse "START name is END END START" into the following parse tree:

在此处输入图像描述

EDIT

What you did wrong is not to give the lexer rule the possibility to recover if the predicate failed. Here's a proper use of a predicate:

rule     :  START NAME_TAG IS_TAG WORD END;

SPACE    : (' '|'\t'|'\n'|'\r')+ {$channel = HIDDEN;};
NAME_TAG : 'name';
IS_TAG   : 'is';
START    : 'START';
WORD     : ('END START')=> 'END START' {$type=END;}
         | 'A'..'Z'+
         ;

fragment END : ;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM