简体   繁体   English

AntLR-字符串识别错误

[英]AntLR - String Recognition Error

I have an ANTLR grammar file with the string definition as below 我有一个ANTLR语法文件,其字符串定义如下

STRING
:  '"' (EscapeSequence | ~('\\'|'"') )* '"' ;
fragment EscapeSequence
  :   '\\' .
;

But this Lexer rule ignore the escape character at the first instance of the quotes. 但是,此Lexer规则在引号的第一个实例处忽略转义字符。 The

id\\=\\" id \\ = \\“

is recognized as the start of the string whereas there is a preceding escape character. 被识别为字符串的开头,而前面有转义字符。 this is happening only for the first quote. 这仅发生在第一个报价中。 All the subsequent quotes, if escaped, are recognized properly. 后面的所有引号(如果转义)都可以正确识别。

/id\\= \\"Testing\\" -- Should not be a string as both quotes are escaped / id \\ = \\“ Testing \\” -不应该是字符串,因为两个引号都被转义
/id\\= "Testing" -- Should be a string between the quotes, since they are not escaped / id \\ = “ Testing” -引号之间应该是字符串,因为引号不会转义

The main problem to solve is to avoid the lexer from trying to recognize a string if the character (only the last one character) preceding a quote is an escape character. 要解决的主要问题是,如果引号前面的字符(仅最后一个字符)是转义字符,则避免词法分析器尝试识别字符串。 If there are multiple escape characters, I need to consider just one character before the starting quote. 如果有多个转义字符,我只需要在引号之前考虑一个字符。

ANTLR will automatically provide the behavior you desire in almost every situation. ANTLR将在几乎每种情况下自动提供您想要的行为。 Consider the following input: 考虑以下输入:

/id\=\"Testing\"

The critical requirement involves the location and length of the token preceding the first quote character. 关键要求涉及第一个引号字符之前的令牌的位置和长度。 In the following block I add spaces only for illustrating conditions that occur between characters. 在以下块中,我仅添加空格以说明字符之间发生的情况。

/ i d \ = \ " T e s t i n g \ "
           ^
           |
           ----------- Make sure no token can *end* here

By ensuring that the first " character is included as part of the token which also includes the \\ character before it, you ensure that the first " character will never be interpreted as the start of a STRING token. 通过确保第一个"字符作为令牌的一部分包括在其前面,还包括\\字符,可以确保第一个"字符永远不会被解释为STRING令牌的开始。

If the above condition is not met, your " character will be treated as the start of a STRING token. 如果不满足上述条件,则您的"字符被视为STRING令牌的开头。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM