[英]Resolving Lexer and Parser ambiguities in ANTLR4
In ANTLR4 I have a lexer rule that says that I can get any word using any character but spaces and line breaks. 在ANTLR4中,我有一个词法分析器规则,该规则说我可以使用任何字符(空格和换行符)获取任何单词。 It is defined as this:
定义如下:
WORD : ~[ \t\r\n:,]+;
I also have a lexer rule (defined before than WORD) for going to an EVAL mode: 我还具有进入EVAL模式的词法分析器规则(比WORD之前定义):
OPENEVAL : '${' -> pushMode(EVAL);
mode EVAL;
CLOSEEVAL : '}' -> popMode;
... (more lexer definitions for EVAL mode) ...
In the parser file I'm trying to detect a grammar rule OR a word. 在解析器文件中,我试图检测语法规则或单词。 So I do the following:
因此,我执行以下操作:
eval : evaluation
| WORD;
evaluation : OPENEVAL somestuff CLOSEEVAL;
somestuff uses lexer rules defined in the EVAL mode. somestuff使用在EVAL模式下定义的词法分析器规则。 The problem is, when evaluating the eval rule, it identifies the text as a WORD token, and not as a evalution grammar rule.
问题是,在评估评估规则时,它将文本识别为WORD令牌,而不是评估语法规则。 I mean, if I enter some text like:
我的意思是,如果我输入一些文本,例如:
${stuff to be evaluated}
It should go to the evaluation rule, but instead, it identifies it as a WORD (taking the "${stuff" part only) 它应该转到评估规则,但是,它将其标识为WORD(仅使用“ $ {stuff”部分)
I know that there is an ambiguity between evaluation and WORD, but I thought that ANTLR was going to take the first coincidence of the parser rule ( evaluation in this case). 我知道评估和WORD之间存在歧义,但我认为ANTLR将采用解析器规则的第一个巧合(在本例中为评估 )。
Sorry if this is too confusing, I tried to summarize this as good as possible (I didn't want to put the full parser and lexer contents to avoid a wall of text basically). 抱歉,如果这太令人困惑,我尝试将其尽可能地加以总结(我不想放入完整的解析器和词法分析器内容,从而基本上避免了文本墙)。
Another option I considered was to define "WORD" as anything but text surrounded by ${ and }. 我考虑过的另一种选择是将“ WORD”定义为除$ {和}包围的文本以外的任何内容。 But I don't know how to create such a lexer rule.
但是我不知道如何创建这样的词法分析器规则。
How could I solve this? 我该如何解决? To distinguish between evaluation and WORD?
区分评估和WORD?
You need to include a predicate preventing the inclusion of $
in a WORD
when its followed by {
. 您需要包含一个谓词,以防止
$
后面跟着{
时在WORD
包含$
。
WORD
: ( ~[ \t\r\n:,$]
| '$' {_input.LA(1) != '{'}?
)+
;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.