简体   繁体   English

让Antlr lexer处理语法错误作为令牌

[英]Have Antlr lexer handle syntax errors as tokens

I'm using Antlr 4.2.2 and Java 1.7 for some text processing. 我正在使用Antlr 4.2.2和Java 1.7进行一些文本处理。 I've extended BaseErrorListener and overridden syntaxError() to report syntax errors, which works well. 我扩展了BaseErrorListener并重写了errorError()以报告语法错误,效果很好。 But I want it to treat the mismatched text as a token and return it, rather than dropping it entirely. 但是我希望它将不匹配的文本作为令牌并返回它,而不是完全删除它。

In my lexer I have this rule: 在我的词法分析器中,我有以下规则:

TEXT : ~[<{|]+ ;

When I try to parse "foo { {" I get a syntax error as expected: token recognition error at: '{ {'. 当我尝试解析“ foo {{”时,出现语法错误,如预期:令牌识别错误:“ {{”。 But I'd like that '{ {' to be reported as a token as well, so that it doesn't get dropped from the input stream. 但是我也希望将'{{'也作为令牌来报告,以便不会从输入流中删除它。

You could add a catchall lexer rule like this at the end of the file: 您可以在文件末尾添加一个包含所有内容的词法分析器规则:

Error : . ;

This will produce Error tokens which will most likely be reported as extra "Error" token during parsing. 这将产生Error令牌,在解析期间很可能会报告为extra "Error" token

You could also do this: 您也可以这样做:

 SilentError : . -> channel(LexingErrorChannel); // you need to set the constant for this channel

Which will silently ignore the lexing errors (if you like to handle/report them yourself). 如果您想自己处理/报告错误,它将无提示地忽略词汇错误。

But I would not really do this if it can be circumvented. 但是,如果可以绕开它, 真的不会这样做。

Note: This will produce one Error token per character. 注意:这将为每个字符产生一个Error令牌。 If you "know" possible errors, you can add other rules like this: 如果您“知道”可能的错误,则可以添加其他规则,如下所示:

Error : [<{|]'+
      | .
      ;

Be careful not to be too greedy though. 注意不要太贪心。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM