简体   繁体   English

Java中语法解析器的正则表达式

[英]Regular expression for Syntax parser in Java

I need a regular expression for a Java syntax parser that matches my programming language syntax that look like this: 我需要一个Java语法解析器的正则表达式,使其与我的编程语言语法相匹配,如下所示:

Variable1={1,2,3}
Variable2=Variable1+{4,5,6}+{}*{2}
Variable3=(Variable2+{1})*Variable1
?Variable3 
?{1,2,3}
?Variable3+{1,2,3}

Expression assignments to variables contain "=" and evaluations start with a "?" 变量的表达式分配包含“ =”,并且评估以“?”开头 sign. 标志。 Inside parenthesis, you can define a new expression, but the new expression can contain parenthesis again, so it's like a recursive regular assignment, which is not possible in this way: 在圆括号内,您可以定义一个新表达式,但是新表达式可以再次包含圆括号,因此就像递归正则赋值,这种方式是不可能的:

String IdPattern = "[a-zA-Z][a-zA-Z0-9]*";            
String SePattern ="\\{"+"([0-9]*)(\\,[0-9]+)*"+"\\}";  


// Problem at next line:
   String CoPattern  = "\\(" + ExPattern + "\\)";   
// CoPattern depends on 
// Expattern, which depends on TePattern, 
// which depends on FaPattern, which depends on CoPattern again.

String FaPattern= "("+IdPattern+"|"+SePattern+"|"+CoPattern+")";              
String TePattern = FaPattern + "("+ "\\*"+ FaPattern+ ")*" ;   
String ExPattern= "" + TePattern + "(" + "\\+"+ TePattern+")*";  


String AsPattern =  "("+IdPattern+"="+ExPattern+")";  
String PriPattern = "(\\?"+ExPattern +")";                     
String StaPattern = "("+AsPattern+"|"+PriPattern+")";    
String Pro = StaPattern+"$";       
System.out.println("Input=((({20}+{1,2,3})))".matches(Pro));

The problem here is that CoPattern, depends on ExPattern, which depends on FaPattern, which depends on CoPattern itself again. 这里的问题是CoPattern依赖于ExPattern,而ExPattern依赖于FaPattern,FaPattern再次依赖于CoPattern本身。 So how do I make this work? 那我该如何做呢?

Inside parenthesis, you can define a new expression, but the new expression can contain parenthesis again, so it's like a recursive regular assignment, which is not possible: 在括号内,您可以定义一个新表达式,但是新表达式可以再次包含括号,因此就像递归正则赋值,这是不可能的:

You figured it yourself: it doesn't seem to work. 您自己想过:它似乎不起作用。

Thus the simple answer is: regular expressions are an insufficient tool here. 因此,简单的答案是:在这里,正则表达式是一个不足的工具。 You should very much look into building a real parser instead. 您应该非常考虑构建一个真正的解析器。

Not only because of the hard conceptual limitations, see here for example. 不仅由于严格的概念限制,请参见此处的示例。 But because: building a parser is more than matching input. 但由于:建立一个解析器比匹配输入 One key element of a compiler/parsers is to give feedback on invalid input. 编译器/解析器的一个关键要素是就无效输入提供反馈 A regular expression gives you a binary "matches" vs "does not match" answer. 正则表达式为您提供二进制“匹配”与“不匹配”答案。 But as programmer, you wan't to be told "your input is invalid, and most likely, one problem is a missing bracket over here and an invalid identifier over there ". 但是作为程序员,您将不会被告知“您的输入无效,并且很可能出现的一个问题是,这里缺少括号,那里的标识符无效 ”。

So even if you somehow get that approach to work for you, it will give you only a binary answer. 所以,即使你以某种方式获取 的方式为你工作,它会给你只是一个二进制的答案。 And: a "proof of concept" isn't the same as having a reasonable, robust foundation to build on. 并且:“概念验证”与拥有合理,强大的基础可以建立基础。

It is your project, your "new language". 这是您的项目,您的“新语言”。 You should understand any part of the tooling around it. 您应该了解围绕它的工具的任何部分。 Coming from there, "I have seen that super complicated regex that supposedly solves my problem, can someone adapt that to my needs" ... is clearly not a good starting point. 从那里开始,“我已经看到超级复杂的正则表达式可以解决我的问题,有人可以适应我的需求”……显然不是一个好的起点。

Regular expressions are a very helpful and import tool, but they need to be used with care. 正则表达式是一个非常有用的导入工具,但必须谨慎使用。 My personal rule of thumb: when your regex is so complicated that you need other people to explain it to you, even write it down for you ... then consider not using a regex. 我个人的经验法则是:当您的正则表达式非常复杂,以至于您需要其他人向您解释它时,甚至为您写下来……然后考虑不使用正则表达式。 Because you are probably out of your league. 因为您可能不在联盟之列。 And you will be the one who has to maintain that code. 您将是必须维护该代码的人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM