[英]Antlr parser for custom requirement
I have a very peculiar requirement to parse inputs using ANTLR. 我对使用ANTLR解析输入有非常特殊的要求。 I would like to be able to parse expressions like;
我希望能够解析类似的表达式;
Correct Inputs 正确输入
Invalid Inputs 输入无效
Well, any expressions which come after | EATALL
好了,这跟从任何表情
| EATALL
| EATALL
& before | EATALL
| EATALL
及之前| EATALL
| EATALL
(if any) must be obtained as a single token. | EATALL
(如果有)必须作为单个令牌获得。 While in case of other simple inputs where | EATALL
而在其他简单输入的情况下,
| EATALL
| EATALL
doesn't appear, only valid combination of _
, -
, [a-zA-Z0-9]
is tokenized as a one token. | EATALL
不会出现,只有_
, -
, [a-zA-Z0-9]
有效组合被标记为一个标记。 In pseudocode, 用伪代码,
This already seems like an ambiguous case of tokenization for me. 对我来说,这似乎已经是模棱两可的情况了。 I am seeking your suggestions on dealing problems like these in antlr.
我正在就处理此类问题寻求您的建议。 Thanking you in advanced.
在此先感谢您。
So, what have you tried? 那么,您尝试了什么? Is you question specific to Antlr 3 or can you use Antlr 4?
您是对Antlr 3提出疑问还是可以使用Antlr 4?
For Antlr 3, you can use semantic predicates to condition token rule selection. 对于Antlr 3,您可以使用语义谓词来限制令牌规则的选择。 Since Antlr 4 does not have symbolic semantic predicates, you can use native code actions to achieve essentially the same result.
由于Antlr 4没有符号语义谓词,因此您可以使用本机代码操作来实现基本相同的结果。 For example (untested):
例如(未测试):
lexer grammar eatall ;
ValidSimple : { isCurrenLineJustTEXTandWS() }? TEXT ;
-- or --
ValidSimple : TEXT ( WS TEXT)* EOL? { emitEachTEXTasNewValidSimpleToken(); } ;
ValidEatAll : IgnoreL .*? IgnoreR { trimIgnoreLRTextfromTokenText(); } ;
Invalid : WS+ | .*? EOL? -> channel(HIDDEN) ;
IgnoreL : .*? MARK ;
IgnoreR : MARK .*? EOL? ;
fragment MARK : '| EATALL' ;
fragment TEXT : [a-zA-Z0-9_-] ;
fragment EOL : '\r'? '\n' ;
fragment WS : [ \t] ;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.