简体   繁体   English

这个奇怪的 StringBuilder 在 ANTLR 生成的代码中做什么?

[英]What is this strange StringBuilder doing in ANTLR-generated code?

I'm trying to learn how to use Antlr4 in Unity.我正在尝试学习如何在 Unity 中使用 Antlr4。 I saw code in class ActionLexer from other program我在其他程序的 class ActionLexer 中看到了代码

private static string _serializeATN()
{
StringBuilder stringBuilder = new StringBuilder();
        stringBuilder.Append("\u0003а훑舆괭䐗껱趀ꫝ\u0002\u000e");
        stringBuilder.Append("\u00a0\b\u0001\u0004\u0002\t\u0002\u0004\u0003\t\u0003\u0004\u0004\t\u0004\u0004\u0005\t\u0005\u0004\u0006");
        stringBuilder.Append("\t\u0006\u0004\a\t\a\u0004\b\t\b\u0004\t\t\t\u0004\n\t\n\u0004\v\t\v\u0004\f\t\f");
        stringBuilder.Append("\u0004\r\t\r\u0004\u000e\t\u000e\u0004\u000f\t\u000f\u0004\u0010\t\u0010\u0004\u0011\t\u0011\u0004");
        stringBuilder.Append("\u0012\t\u0012\u0003\u0002\u0003\u0002\u0003\u0003\u0003\u0003\u0003\u0004\u0003\u0004\u0003\u0005\u0003\u0005\u0003");
        stringBuilder.Append("\u0006\u0003\u0006\u0003\a\u0003\a\u0003\b\u0003\b\u0003\b\u0003\b\u0003\b\u0003\b\u0003\b\u0003\b");   
             ...
return stringBuilder.ToString();

}

and then I copyed that code in my unity and debug it.然后我统一复制了该代码并进行了调试。 the result is strange string.结果是奇怪的字符串。

+       stringBuilder   "а훑舆괭䐗껱趀ꫝ \b\t\t\t\t\t\a\t\a\b\t\b\t\t\t\n\t\n\v\t\v\f\t\f" System.Text.StringBuilder

I want to know why that happen.我想知道为什么会这样。 what is role for this function?这个 function 的作用是什么?

The ATN is the internal.network (Augmented Transition Network) used by the ATN interpreter to execute the parser + lexer state machines. ATN是ATN解释器用来执行parser + lexer state机器的internal.network(Augmented Transition Network)。 This structure is generated by ANTLR out of the grammar it was given and is at the heart of the entire machinery of the ANTLR implementation.这个结构是由 ANTLR 根据给定的语法生成的,并且是 ANTLR 实现的整个机制的核心。

The generated parser and lexers need their ATN to work properly.生成的解析器和词法分析器需要它们的 ATN 才能正常工作。 But since the generated files are text it was necessary to serialise the generated.network into a text string, to be able to write it to the generated files.但由于生成的文件是文本,因此有必要将 generated.network 序列化为文本字符串,以便能够将其写入生成的文件。 This string is then de-serialised on startup of the parsing application to regenerate the original ATN in memory. So in short: it's not text per se, but binary data stored as text.然后在解析应用程序启动时对该字符串进行反序列化,以在 memory 中重新生成原始 ATN。简而言之:它本身不是文本,而是以文本形式存储的二进制数据。

The ATN belongs to the internals of the parser/lexer implementation and you can safely ignore it for most purposes. ATN 属于解析器/词法分析器实现的内部,在大多数情况下您可以安全地忽略它。

You are looking at non-printable unicode characters.您正在查看不可打印的 unicode 个字符。 Quite what they are doing here is a bit of mystery.他们在这里做什么有点神秘。

  • is ASCII code 2 (STX)是 ASCII 码 2 (STX)
  • is ASCII code 3 (ETX)是 ASCII 码 3 (ETX)
  • \t is a tab character \t 是制表符
  • \a is a Line Feed character \a 是换行符

https://www.rapidtables.com/code/text/ascii-table.html https://www.rapidtables.com/code/text/ascii-table.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM