简体   繁体   English

基于上下文无关语法解析正则表达式

[英]Parsing regular expressions based on a context free grammar

Good evening, Stack Overflow. 晚上好,Stack Overflow。 I'd like to develop an interpreter for expressions based on a pretty simple context-free grammar: 我想基于一个非常简单的无上下文语法为表达式开发一个解释器:

Grammar 语法

Basically, the language is constituted by 2 base statements 基本上,该语言由2个基本语句构成

( SET var 25 ) // Output: var = 25
( GET ( MUL var 5 ) ) // Output: 125
( SET var2 ( MUL 30 5 ) ) //Output: var2 = 150

Now, I'm pretty sure about what should I do in order to interpret a statement: 1) Lexical analysis to turn a statement into a sequence of tokens 2) Syntax analysis to get a symbol table (HashMap with the variables and their values) and a syntactic tree (to perform the GET statements) to 3) perform an inorder visit of the tree to get the results I want. 现在,我非常确定我应该怎样做才能解释一个陈述:1)将语句转换为标记序列的词法分析2)获取符号表的语法分析(带有变量及其值的HashMap)和一个语法树(执行GET语句)3)执行树的inorder访问以获得我想要的结果。

I'd like some advice on the parsing method to read the source file. 我想要一些关于解析源文件的解析方法的建议。 Considering the parser should ignore any whitespace, tabulation or newline, is it possible to use a Java Pattern to get a general statement I want to analyze? 考虑到解析器应该忽略任何空格,制表或换行符,是否可以使用Java模式来获取我想要分析的一般语句? Is there a good way to read a statement weirdly formatted (and possibly more complex) like this 是否有一种很好的方法可以读取像这样奇怪格式化(可能更复杂)的语句

(
  SET var

 25
 )

without confusing the parser with the open and closed parenthesises? 没有混淆解析器与开放和封闭的括号?

For example 例如

Scanner scan; //scanner reading the source file
String pattern = "..." //ideal pattern I've found to represent an expression
while(scan.hasNext(pattern))
  Interpreter.computeStatement(scan.next(pattern));

would it be a viable option for this problem? 它会成为这个问题的可行选择吗?

Solution proposed by Ira Braxter : Ira Braxter提出的解决方案:

Your title is extremely confused. 你的头衔非常困惑。 You appear to want to parse what are commonly called "S-expressions" in the LISP world; 您似乎想要解析LISP世界中通常称为“S表达式”的东西; this takes a (simple but) context-free grammar. 这需要一个(简单但是)无上下文语法。 You cannot parse such expressions with regexps. 您无法使用regexp解析此类表达式。 Time to learn about real parsers. 是时候学习真正的解析器了。


Maybe this will help: stackoverflow.com/a/2336769/120163 也许这会有所帮助: stackoverflow.com/a/2336769/120163

In the end, I understood thanks to Ira Baxter that this context free grammar can't be parsed with RegExp and I used the concepts of S-Expressions to build up the interpreter, whose source code you can find here . 最后,我理解感谢Ira Baxter,这个上下文无关语法不能用RegExp解析,我使用S-Expressions的概念来构建解释器,你可以在这里找到它的源代码。 If you have any question about it (mainly because the comments aren't translated in english, even though I think the code is pretty clear), just message me or comment here. 如果您对此有任何疑问(主要是因为评论未翻译成英文,即使我认为代码非常清楚),请在此留言或发表评论。

Basically what I do is: 基本上我所做的是:

  • Parse every character and tokenize it (eg '(' -> is OPEN_PAR, while "SET" -> STATEMENT_SET or a random letter like 'b' is parsed as a VARIABLE ) 解析每个字符并对其进行标记(例如'(' - >是OPEN_PAR,而“SET” - > STATEMENT_SET或像'b'这样的随机字母被解析为VARIABLE)
  • Then, I use the token list created to do a syntactic analysis, which checks the patterns occuring inside the token list, according to the grammar 然后,我使用创建的令牌列表进行语法分析,根据语法检查令牌列表中出现的模式
  • If there's an expression inside the statement, I check recursively for any expression inside an expression, throwing an exception and going to the following correct statement if needed 如果语句中有表达式,我会递归检查表达式中的任何表达式,抛出异常并在需要时转到以下正确的语句
  • At the end of analysing every single statement, I compute the statement as necessary as for specifications 在分析每个语句的最后,我根据规范计算必要的语句

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM