简体   繁体   English

处理以ANTLR4中的转义引号结尾的字符串文字

[英]Handling String Literals which End in an Escaped Quote in ANTLR4

How do I write a lexer rule to match a String literal which does not end in an escaped quote? 如何编写词法分析器规则以匹配不以转义引用结尾的字符串文字?

Here's my grammar: 这是我的语法:

lexer grammar StringLexer;

// from The Definitive ANTLR 4 Reference
STRING: '"' (ESC|.)*? '"';
fragment ESC : '\\"' | '\\\\' ;

Here's my java block: 这是我的java块:

String s = "\"\\\""; // looks like "\"
StringLexer lexer = new StringLexer(new ANTLRInputStream(s)); 

Token t = lexer.nextToken();

if (t.getType() == StringLexer.STRING) {
    System.out.println("Saw a String");
}
else {
    System.out.println("Nope");
}

This outputs Saw a String . 这输出Saw a String Should "\\" really match STRING ? "\\"真的应该与STRING匹配吗?

Edit: Both 280Z28 and Bart's solutions are great solutions, unfortunately I can only accept one. 编辑: 280Z28和Bart的解决方案都是很好的解决方案,不幸的是我只接受一个。

For properly formed input, the lexer will match the text you expect. 对于正确形成的输入,词法分析器将匹配您期望的文本。 However, the use of the non-greedy operator will not prevent it from matching something with the following form: 但是,使用非贪婪的运算符不会阻止它与以下形式匹配:

'"' .*? '"'

To ensure strings are tokens in the most "sane" way possible, I recommended using the following rules. 为了尽可能以最“理智”的方式确保字符串是令牌,我建议使用以下规则。

StringLiteral
  : UnterminatedStringLiteral '"'
  ;

UnterminatedStringLiteral
  : '"' (~["\\\r\n] | '\\' (. | EOF))*
  ;

If your language allows string literals to span across multiple lines, you would likely need to modify UnterminatedStringLiteral to allow matching end-of-line characters. 如果您的语言允许字符串文字跨越多行,则可能需要修改UnterminatedStringLiteral以允许匹配行尾字符。

If you do not include the UnterminatedStringLiteral rule, the lexer will handle unterminated strings by simply ignoring the opening " character of the string and proceeding to tokenize the content of the string. 如果您不包含UnterminatedStringLiteral规则,则词法分析器将通过简单地忽略字符串的开头"字符并继续标记字符串的内容来处理未终止的字符串。

Yes, "\\" is matched by the STRING rule: 是的, "\\"STRING规则匹配:

            STRING: '"' (ESC|.)*? '"';
                     ^       ^     ^
                     |       |     |
// matches:          "       \     "

If you don't want the . 如果你不想要的话. to match the backslash (and quote), do something like this: 要匹配反斜杠(和引号),请执行以下操作:

STRING: '"' ( ESC | ~[\\"] )* '"';

And if your string can't be spread over multiple lines, do: 如果您的字符串无法分布在多行上,请执行以下操作:

STRING: '"' ( ESC | ~[\\"\r\n] )* '"';

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM