简体   繁体   English

ANTLR 4.4中的勉强匹配

[英]Reluctant matching in ANTLR 4.4

Just as the reluctant quantifiers work in Regular expressions I'm trying to parse two different tokens from my input ie, for operand1 and operator. 就像勉强的量词在正则表达式中工作一样,我正在尝试从输入中解析两个不同的标记,即用于操作数和运算符。 And my operator token should be reluctantly matched instead of greedily matching input tokens for operand1. 而且我的运算符应该被勉强地匹配,而不是贪婪地匹配操作数1的输入令牌。

Example, Input: 示例,输入:

Active Indicator in ("A", "D", "S") 活动指示器(“ A”,“ D”,“ S”)

(To simplify I have removed the code relevant for operand2) (为简化起见,我删除了与操作数2相关的代码)

Expected operand1: 预期的操作数1:

Active Indicator 活跃指标

Expected operator: 预期的运营商:

in

Actual output for operand1: 操作数1的实际输出:

Active indicator in 活动指示器

and none for the operator rule. 而对于运算符规则则没有。 Below is my grammar code: 下面是我的语法代码:

grammar Test;

condition: leftOperand WHITESPACE* operator;

leftOperand:  ALPHA_NUMERIC_WS ;
operator: EQUALS | NOT_EQUALS | IN | NOT_IN;

EQUALS  : '=';
NOT_EQUALS  : '!=';
IN  : 'in';
NOT_IN  : 'not' WHITESPACE 'in';

WORD: (LOWERCASE | UPPERCASE )+ ;
ALPHA_NUMERIC_WS:    WORD  ( WORD| DIGIT | WHITESPACE )* ( WORD | DIGIT)+ ;
WHITESPACE  : (' ' | '\t')+;

fragment DIGIT: '0'..'9' ;

LOWERCASE   : [a-z] ;
UPPERCASE   : [A-Z] ;

One solution to this would be to not produce one token for several words but one token per word instead. 一种解决方案是不为多个单词生成一个令牌,而是为每个单词生成一个令牌。
Your grammar would then look like this: 您的语法将如下所示:

grammar Test;

condition: leftOperand operator;

leftOperand:  ALPHA_NUMERIC+ ;
operator: EQUALS | NOT_EQUALS | IN | NOT_IN;

EQUALS  : '=';
NOT_EQUALS  : '!=';
IN  : 'in';
NOT_IN  : 'not' WHITESPACE 'in';

WORD: (LOWERCASE | UPPERCASE )+ ;
ALPHA_NUMERIC:    WORD  ( WORD| DIGIT)* ;
WHITESPACE  : (' ' | '\t')+ -> skip; // ignoring WS completely

fragment DIGIT: '0'..'9' ;

LOWERCASE   : [a-z] ;
UPPERCASE   : [A-Z] ;

Like this the lexer will not match the whole input as ALPHA_NUMERIC_WS once the corresponding lexer rule has been entered because any occuring WS forces the lexer to leave the ALPHA_NUMERIC rule. 这样,一旦输入了相应的词法分析器规则,则词法分析器就不会将整个输入匹配为ALPHA_NUMERIC_WS ,因为发生的任何WS迫使词法分析器离开ALPHA_NUMERIC规则。 Therefore any following input will be given a chance to be matched by other lexer-rules (in the order they are defined in the grammar). 因此,随后的任何输入都将有机会与其他词法规则匹配(按照它们在语法中定义的顺序)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM