基于上下文无关语法解析正则表达式

Question

Good evening, Stack Overflow. 晚上好，Stack Overflow。 I'd like to develop an interpreter for expressions based on a pretty simple context-free grammar: 我想基于一个非常简单的无上下文语法为表达式开发一个解释器：

Grammar 语法

Basically, the language is constituted by 2 base statements 基本上，该语言由2个基本语句构成

( SET var 25 ) // Output: var = 25
( GET ( MUL var 5 ) ) // Output: 125
( SET var2 ( MUL 30 5 ) ) //Output: var2 = 150

Now, I'm pretty sure about what should I do in order to interpret a statement: 1) Lexical analysis to turn a statement into a sequence of tokens 2) Syntax analysis to get a symbol table (HashMap with the variables and their values) and a syntactic tree (to perform the GET statements) to 3) perform an inorder visit of the tree to get the results I want. 现在，我非常确定我应该怎样做才能解释一个陈述：1）将语句转换为标记序列的词法分析2）获取符号表的语法分析（带有变量及其值的HashMap）和一个语法树（执行GET语句）3）执行树的inorder访问以获得我想要的结果。

I'd like some advice on the parsing method to read the source file. 我想要一些关于解析源文件的解析方法的建议。 Considering the parser should ignore any whitespace, tabulation or newline, is it possible to use a Java Pattern to get a general statement I want to analyze? 考虑到解析器应该忽略任何空格，制表或换行符，是否可以使用Java模式来获取我想要分析的一般语句？ Is there a good way to read a statement weirdly formatted (and possibly more complex) like this 是否有一种很好的方法可以读取像这样奇怪格式化（可能更复杂）的语句

(
  SET var

 25
 )

without confusing the parser with the open and closed parenthesises? 没有混淆解析器与开放和封闭的括号？

For example 例如

Scanner scan; //scanner reading the source file
String pattern = "..." //ideal pattern I've found to represent an expression
while(scan.hasNext(pattern))
  Interpreter.computeStatement(scan.next(pattern));

would it be a viable option for this problem? 它会成为这个问题的可行选择吗？

Answer 1

Solution proposed by Ira Braxter : Ira Braxter提出的解决方案：

Your title is extremely confused. 你的头衔非常困惑。 You appear to want to parse what are commonly called "S-expressions" in the LISP world; 您似乎想要解析LISP世界中通常称为“S表达式”的东西; this takes a (simple but) context-free grammar. 这需要一个（简单但是）无上下文语法。 You cannot parse such expressions with regexps. 您无法使用regexp解析此类表达式。 Time to learn about real parsers. 是时候学习真正的解析器了。

Maybe this will help: stackoverflow.com/a/2336769/120163 也许这会有所帮助： stackoverflow.com/a/2336769/120163

Answer 2

In the end, I understood thanks to Ira Baxter that this context free grammar can't be parsed with RegExp and I used the concepts of S-Expressions to build up the interpreter, whose source code you can find here . 最后，我理解感谢Ira Baxter，这个上下文无关语法不能用RegExp解析，我使用S-Expressions的概念来构建解释器，你可以在这里找到它的源代码。 If you have any question about it (mainly because the comments aren't translated in english, even though I think the code is pretty clear), just message me or comment here. 如果您对此有任何疑问（主要是因为评论未翻译成英文，即使我认为代码非常清楚），请在此留言或发表评论。

Basically what I do is: 基本上我所做的是：

Parse every character and tokenize it (eg '(' -> is OPEN_PAR, while "SET" -> STATEMENT_SET or a random letter like 'b' is parsed as a VARIABLE ) 解析每个字符并对其进行标记（例如'（' - >是OPEN_PAR，而“SET” - > STATEMENT_SET或像'b'这样的随机字母被解析为VARIABLE）
Then, I use the token list created to do a syntactic analysis, which checks the patterns occuring inside the token list, according to the grammar 然后，我使用创建的令牌列表进行语法分析，根据语法检查令牌列表中出现的模式
If there's an expression inside the statement, I check recursively for any expression inside an expression, throwing an exception and going to the following correct statement if needed 如果语句中有表达式，我会递归检查表达式中的任何表达式，抛出异常并在需要时转到以下正确的语句
At the end of analysing every single statement, I compute the statement as necessary as for specifications 在分析每个语句的最后，我根据规范计算必要的语句

基于上下文无关语法解析正则表达式

问题描述

2 个解决方案

解决方案1
1 2015-08-27 09:16:14

解决方案2
1 已采纳 2015-10-04 18:26:57

基于上下文无关语法解析正则表达式

问题描述

2 个解决方案

解决方案1 1 2015-08-27 09:16:14

解决方案2 1 已采纳 2015-10-04 18:26:57

解决方案1
1 2015-08-27 09:16:14

解决方案2
1 已采纳 2015-10-04 18:26:57