ANTLR4 中的字符串文字

Question

I'm using antlr4 C++ runtime and I'd like to create a string literal in my lexer definition file.我正在使用 antlr4 C++ 运行时，我想在我的词法分析器定义文件中创建一个字符串文字。 How can I do this?我怎样才能做到这一点？

What I have so far:到目前为止我所拥有的：

V_STRING            :   '"' ~('\\' | '"')* '"';

I doesn't work with我不合作

printf("string literal\n");

but works with但适用于

printf("string literal\\n");

I don't want to explicitly escape the new line character.我不想明确转义换行符。

my assumptions are that antlr interprets the new line character as a regular new line (when reading a file, for example).我的假设是 antlr 将换行符解释为常规换行符（例如，在读取文件时）。

Thanks in advance.提前致谢。

Answer 1

It's always a good idea to list out your token stream to see if your Lexer rules really do what you expect.列出您的令牌 stream 以查看您的 Lexer 规则是否真的符合您的预期总是一个好主意。 (Look into the tokens option of the TestRig ; also, some plugins will show you your tokens) （查看TestRig的tokens选项；另外，一些插件会显示你的 token）

In your case your rule essentially says that a String is " a " followed by 0 or more characters that are not a \ or a " and then a " ".在您的情况下，您的规则本质上是说字符串是“ a " ，后跟0个或多个不是\或a的字符，然后是" " 。

So, when the Lexer encounters your \ , matches the ~('\\\\'|'")* part of the rule and then looks for a " (which it does not find, since the \ is followed by a n ), so It won't recognize "string literal\n" as a V_STRING token (it also fails to match "string literal\\n" as well, here, so I'm not quite sure what's going on with the example that "works").因此，当 Lexer 遇到您的\时，匹配规则的~('\\\\'|'")*部分，然后查找" （它没有找到，因为\后跟n ），所以它不会将"string literal\n"识别为V_STRING标记（它也无法匹配"string literal\\n" ，在这里，所以我不太确定“作品”）。

try:尝试：

V_STRING: '"' ~["]* '"';

Note: this is a very simple String rule, but it accepts your input.注意：这是一个非常简单的字符串规则，但它接受您的输入。 You probably want to examine grammars for other languages to see how you might want to handle strings in your language;您可能想检查其他语言的语法，以了解您可能希望如何处理您的语言中的字符串； there are several approaches (and many of them involve using Lexer modes).有几种方法（其中许多涉及使用 Lexer 模式）。 You can find examples here)你可以在这里找到例子）

If you want the "\n" to be treated as a newline, just understand that the parser won't do that for you, you'll just see the characters "" and "n".如果您希望将“\n”视为换行符，只需了解解析器不会为您执行此操作，您只会看到字符“”和“n”。 It'll be up to you to handle encoding the escaped characters (and it's once you try to handle " that it'll get more complicated and you'll need to look into Lexer modes)由您来处理转义字符的编码（一旦您尝试处理“，它会变得更加复杂，您需要查看 Lexer 模式）

ANTLR4 中的字符串文字

问题描述

1 个解决方案

解决方案1
0 2022-08-17 03:06:28

ANTLR4 中的字符串文字

问题描述

1 个解决方案

解决方案1 0 2022-08-17 03:06:28

解决方案1
0 2022-08-17 03:06:28