简体   繁体   English

如何使用ANTLR构建交互式解析器?

[英]How can I build an interactive parser with ANTLR?

I've been looking at Java and ANTRL4, a very nice combination to build parsers. 我一直在研究Java和ANTRL4,这是一个非常好的组合来构建解析器。 However, as I test them, I'm noting that parsing doesn't start until I send an EOF (CMD-D on a Mac, for example) to the input. 但是,当我测试它们时,我注意到在我将EOF(例如Mac上的CMD-D)发送到输入之前,解析不会启动。 That's fine for parsing a file but I can easily imagine building tools such as command line shells/processors very quickly with ANTLR. 这对于解析文件很好,但我可以很容易地想象使用ANTLR快速构建命令行shell /处理器等工具。 But it isn't doable unless I can make it parse as characters are typed (so that things happen after RETURN or even after a TAB if one wanted to do command completion, say). 但是这是不可行的,除非我可以在键入字符时进行解析(因此事情发生在RETURN之后,甚至在TAB之后如果想要完成命令完成,比如说)。

Anyone know how to do this? 有人知道怎么做吗?

The simplest way to use Antlr4 'interactively' is to recognize that the parsing operation is quite fast and that, in a warm VM, re-instancing the parser is also quite fast. “交互式”使用Antlr4的最简单方法是识别解析操作非常快,并且在温暖的VM中,重新实例化解析器也非常快。 Indeed, well more than fast enough to re-parse the entire input text between each keystroke. 实际上,远远快于在每次击键之间重新解析整个输入文本。

The basic strategy is, from a key event, grab the entire current input text and process it in a non-display thread. 基本策略是,从关键事件中获取整个当前输入文本并在非显示线程中处理它。 If the processing does not complete before the next key event, discard the processing thread and start a new one. 如果处理未在下一个键事件之前完成,则丢弃处理线程并开始新的处理线程。 When a processing iteration does complete, set the next key event to buffer (as needed) and apply the results to the input text. 处理迭代完成后,将下一个键事件设置为缓冲区(根据需要)并将结果应用于输入文本。

A sustained stream of keystrokes is unlikely to be faster than 100ms per key event (about 80 wpm). 持续的击键流不可能超过每个键事件100ms(大约80 wpm)。 On my system, repeated simple parsing of an editor's 'page' of code using the Java.g4 grammar averages around 5ms. 在我的系统上,使用Java.g4语法对编辑器的“页面”代码进行重复的简单解析平均大约5ms。 Even with fairly significant processing, the background thread rarely requires more than about 25ms to complete. 即使处理相当重要,后台线程也很少需要超过25毫秒才能完成。 Of course, YMWV. 当然,YMWV。

Update 更新

If the need is for continuous stream processing -- not 'interactive' -- then Antlr can be adapted to that purpose. 如果需要连续流处理 - 而不是'交互' - 那么Antlr可以适应这个目的。 This will require a minimal custom lexer that meets the Lexer & TokenStream interfaces but waits for actual input data in response to the Parser's getCurrentToken() -- the parser's primary function to fetch the next token from the lexer. 这将需要一个满足Lexer和TokenStream接口的最小自定义词法分析器,但是等待实际输入数据以响应Parser的getCurrentToken() - 解析器的主要功能,从词法分析器获取下一个标记。

    StreamLexer tokens = new StreamLexer(yourInputStream); // custom lexer
    YourParser parser = new YourParser(tokens);
    parser.removeErrorListeners(); // remove ConsoleErrorListener
    parser.addErrorListener(new YourErrorListener());
    parser.setErrorHandler(new YourParserErrorStrategy());
    parser.start();

There is no actual lexer grammar -- the custom lexer simply wraps every input character as a separate token and the parser rules are written accordingly. 没有实际的词法分析器语法 - 自定义词法分析器只是将每个输入字符包装为单独的标记,并相应地编写解析器规则。

In effect, this turns the standard Antlr parser into a grammar-defined 'Push-Parser'. 实际上,这会将标准的Antlr解析器转换为语法定义的“Push-Parser”。 Speed will be limited to the run time of the matching functions of the parser or the data rate of the input stream, whichever is slower. 速度将限于解析器的匹配函数的运行时间或输入流的数据速率,以较慢者为准。

To achieve any significantly greater parsing speed, a purpose-built state machine will likely be necessary. 为了实现任何明显更高的解析速度,可能需要专用的状态机。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM