简体   繁体   English

用于自定义需求的Antlr解析器

[英]Antlr parser for custom requirement

I have a very peculiar requirement to parse inputs using ANTLR. 我对使用ANTLR解析输入有非常特殊的要求。 I would like to be able to parse expressions like; 我希望能够解析类似的表达式;

Correct Inputs 正确输入

  • user name 用户名
  • user_name user-name user_name用户名
  • | | EATALL any thing could come here/ok | EATALL任何事情都可能在这里/确定| EATALL ... 全部吃掉 ...

Invalid Inputs 输入无效

  • user/name 用户名
  • user&name^face 用户名

Well, any expressions which come after | EATALL 好了,这跟从任何表情| EATALL | EATALL & before | EATALL | EATALL及之前| EATALL | EATALL (if any) must be obtained as a single token. | EATALL (如果有)必须作为单个令牌获得。 While in case of other simple inputs where | EATALL 而在其他简单输入的情况下, | EATALL | EATALL doesn't appear, only valid combination of _ , - , [a-zA-Z0-9] is tokenized as a one token. | EATALL不会出现,只有_-[a-zA-Z0-9]有效组合被标记为一个标记。 In pseudocode, 用伪代码,

  • user name -> [user] [name] 用户名-> [用户] [名称]
  • user_name -> [user_name] 用户名-> [用户名]
  • |EATALL user/name my user -> [user/name my user] | EATALL用户/为我的用户命名-> [用户/为我的用户命名]

This already seems like an ambiguous case of tokenization for me. 对我来说,这似乎已经是模棱两可的情况了。 I am seeking your suggestions on dealing problems like these in antlr. 我正在就处理此类问题寻求您的建议。 Thanking you in advanced. 在此先感谢您。

So, what have you tried? 那么,您尝试了什么? Is you question specific to Antlr 3 or can you use Antlr 4? 您是对Antlr 3提出疑问还是可以使用Antlr 4?

For Antlr 3, you can use semantic predicates to condition token rule selection. 对于Antlr 3,您可以使用语义谓词来限制令牌规则的选择。 Since Antlr 4 does not have symbolic semantic predicates, you can use native code actions to achieve essentially the same result. 由于Antlr 4没有符号语义谓词,因此您可以使用本机代码操作来实现基本相同的结果。 For example (untested): 例如(未测试):

lexer grammar eatall ;

ValidSimple : { isCurrenLineJustTEXTandWS() }? TEXT ;
-- or --
ValidSimple : TEXT ( WS TEXT)* EOL?  { emitEachTEXTasNewValidSimpleToken(); } ;

ValidEatAll : IgnoreL .*? IgnoreR    { trimIgnoreLRTextfromTokenText(); } ;
Invalid     : WS+ | .*? EOL?         -> channel(HIDDEN) ;

IgnoreL : .*? MARK ;
IgnoreR : MARK .*? EOL? ;

fragment MARK : '| EATALL' ;
fragment TEXT : [a-zA-Z0-9_-] ;
fragment EOL  : '\r'? '\n' ;
fragment WS   : [ \t] ;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM