简体繁体 English

如何用Java编写一些语法的LALR解析器？

[英]How to write LALR parser for some grammar in Java?

原文 2011-03-23 05:14:25 0 4 java/ parsing/ lalr

I want to write Java code to build a LALR parser for my grammar. 我想编写Java代码来为我的语法构建LALR解析器。 Can someone please suggest some books or some links where I can learn how to write Java code for a LALR parser? 有人可以建议一些书或链接，以便我学习如何为LALR解析器编写Java代码吗？

4 个解决方案

Writing a LALR parser by hand is difficult, but it can he done. 手动编写LALR解析器很困难，但是他可以做到。 If you want to learn the theory behind constructing parsers for them by hand, consider looking into "Parsing Techniques: A Practical Guide" by Grune and Jacobs. 如果要学习手工构造解析器的原理，请考虑阅读Grune和Jacobs撰写的“解析技术：实用指南”。 It's an excellent book on general parsing techniques, and the chapter on LR parsing is particularly good. 这是一本关于常规解析技术的极好的书，关于LR解析的章节特别好。

If you're more interested in just getting a LALR parser that is written in Java, consider looking into Java CUP, which is a general purpose parser generator for Java. 如果您只想获取用Java编写的LALR解析器，请考虑研究Java CUP，它是Java的通用解析器生成器。

Hope this helps! 希望这可以帮助！

You can split the LALR functionality in two parts: preparation of the tables and parsing the input. 您可以将LALR功能分为两部分：准备表和解析输入。

The first part is complex and errorprone, so even if you like knowing how it works I suggest to use a proven working table generator for the LALR states (and for the tokenizer DFA as well). 第一部分很复杂且容易出错，因此即使您想知道它是如何工作的，我还是建议对LALR状态（以及对令牌化器DFA也）使用经过验证的工作表生成器。

The second part consists of consuming those tables using some quite simple algorithms to tokenize and process the input into a parse tree/concrete syntax tree. 第二部分包括使用一些非常简单的算法使用这些表，以对输入进行标记化并将其处理为解析树/具体语法树。 This is easier to implement yourself if you like to do so, and you still have full control over how it works and what it does. 如果您愿意的话，这很容易实现，而且您仍然可以完全控制它的工作方式和作用。

When doing parsing tasks, I personally use the free GOLD Parsing System , which has a nice UI for creating and debugging the grammar and it does also generate table files which can then be loaded and processed by an existing engine or your own implementation (the file format for these CGT files is well documented). 在执行解析任务时，我个人使用了免费的GOLD解析系统，该系统具有用于创建和调试语法的漂亮UI，并且它还生成表文件，然后可以由现有引擎或您自己的实现加载和处理这些表文件（该文件这些CGT文件的格式已被详细记录）。

As previously stated, you would always use a parser-generator to produce an LALAR parser. 如前所述，您将始终使用解析器生成器来生成LALAR解析器。 A few such tools for Java are: 一些用于Java的此类工具是：

SableCC (my personal favourite) SableCC （我的个人收藏）
CUP 杯子
Beaver 3 海狸 3
SJPT SJPT
Gold 金

Just want to mention that my project CookCC ( http://coconut2015.github.io/cookcc/ ) is a LALR(1) parser + Lexer (much like flex). 只是想提一下，我的项目CookCC（ http://coconut2015.github.io/cookcc/ ）是LALR（1）解析器+ Lexer（非常类似于flex）。

The unique feature of CookCC is that you can write your lexer and parser in Java using Java annotations. CookCC的独特功能是您可以使用Java批注用Java编写词法分析器和解析器。 See the calculator example here: https://github.com/coconut2015/cookcc/blob/master/tests/javaap/calc/Calculator.java 在此处查看计算器示例： https : //github.com/coconut2015/cookcc/blob/master/tests/javaap/calc/Calculator.java