简体繁体 English

动态可扩展的通用解析器

[英]Dynamicaly extensible generic parser

原文 2009-09-07 00:49:14 9 3 c#/ java/ parsing

I wrote an application which makes use of a meta-parser generated using CSharpCC (a port of JavaCC). 我编写了一个应用程序，该应用程序利用CSharpCC（JavaCC的端口）生成的元解析器。 Everything works fine and very good I can say. 一切都很好，我可以说很好。

For the nature of the project, I would like to have more flexibility on the possibility to extend the syntax of the meta-language used by the application. 对于项目的性质，我想在扩展应用程序使用的元语言语法的可能性方面具有更大的灵活性 。 Do you know any existing libraries (or articles describing the process of implementation) for Java or C# which I could use to programatically implement my own parser, without being forced to rely to a static syntax? 您是否知道任何现有的Java或C＃库（或描述实现过程的文章），可以用来以编程方式实现自己的解析器，而不必强迫自己依赖静态语法？

Thank you very much for the support. 非常感谢您的支持。

3 个解决方案

Would Scala's combinator parsers do the trick for you? Scala的组合解析器会为您解决问题吗？ Since Scala compiles to Java bytecode, anything you write could be called from your Java code however you please. 由于Scala可以编译为Java字节码，因此您可以随意从Java代码中调用您编写的任何内容。

Take a look at the way that the JNode command-line interface handles parsing of command line arguments. 看一下JNode命令行界面处理命令行参数解析的方式。 Each command 'registers' descriptors for the arguments it is expecting. 每个命令都为其期望的参数“注册”描述符。 The command line syntax is specified separately in XML descriptors, allowing users to tailor a command's syntax to meet their needs. 命令行语法在XML描述符中单独指定，允许用户定制命令的语法以满足他们的需求。

This is underpinned by a framework of Argument classes that are basically context sensitive token recognizers, and a two level grammar / parser. 这是由Argument类框架（基本上是上下文敏感的令牌识别器）和两级语法/解析器支持的。 The parser 'prepares' a user-friendly form of a command syntax into something like BNF, then does a naive backtracking parse, accepting the first complete parse that it finds. 解析器将用户友好的命令语法形式“准备”为BNF之类的东西，然后进行幼稚的回溯解析，接受它找到的第一个完整解析。

The downside of the current implementation is that the parser is inefficient, and probably impractical for parsing input that is more than 20 or so tokens, depending on the syntax. 当前实现的缺点是解析器效率低下，根据语法的不同，解析超过20个左右标记的输入可能不切实际。 (We have ideas for improving performance, but a real fix is probably not possible without a major redesign ... and banning potentially ambiguous command syntaxes.) （我们有提高性能的想法，但是，如果不进行重大的重新设计并禁止潜在的模棱两可的命令语法，可能无法进行真正的修复。）

(Aside: one motivation for this is to support intelligent command argument completion. To do this, the parser runs in a "completion" mode in which it explores all possible (partial) parses, noting its state when it encounters the token / position that the user is trying to complete. Where appropriate, the corresponding Argument classes are then asked to provide context sensitive completions for the current "word".) （此外：一种动机是支持智能命令参数完成。为此，解析器以“完成”模式运行，在该模式下，它探索所有可能的（部分）解析，并在遇到令牌/位置时指出其状态。用户试图完成的操作。然后在适当的时候，要求相应的Argument类为当前“单词”提供上下文相关的完成内容。）

The parser (written in C#) used in the Heron language (a simple object-oriented language) is relatively simple and stable, and should be easy to modify for your needs. Heron语言（一种简单的面向对象的语言）中使用的解析器（用C＃编写）相对简单且稳定，并且应易于修改以满足您的需求。 You can download the source here . 您可以在此处下载源代码。