[英]Alternatives to Regular Expressions
I have a set of strings with numbers embedded in them. 我有一组字符串,其中嵌入了数字。 They look something like /cal/long/3/4/145:999 or /pa/metrics/CosmicRay/24:4:bgp:EnergyKurtosis. 它们看起来像/ cal / long / 3/4/145:999或/ pa / metrics / CosmicRay / 24:4:bgp:EnergyKurtosis。 I'd like to have an expression parser that is 我想要一个表达式解析器
Interesting alternative ideas would be useful. 有趣的替代想法会很有用。 I'm also entertaining the idea of just implementing the subset of regular expressions that I need plus the numerical constraints. 我也很想要实现我需要的正则表达式子集以及数值约束。
Thanks! 谢谢!
There's no reason to reinvent the wheel! 没有理由重新发明轮子! The core of a regular expression engine is built on a strong foundation of mathematics and computer science; 正则表达式引擎的核心是建立在数学和计算机科学的坚实基础之上; the reason we continue to use them today is they are principally sound and won't be improved in the foreseeable future. 我们今天继续使用它们的原因是它们主要是合理的,并且在可预见的将来不会得到改善。
If you do find or create some alternative parsing language that only covers a subset of the possibilities Regex can, you will quickly have a user asking for a concept that can be expressed in Regex but your flavor just plain leaves out. 如果您确实找到或创建了一些替代解析语言,只涵盖了Regex可以实现的可能性的一部分,那么您很快就会有一个用户要求一个可以在Regex中表达的概念,但您的味道很简单。 Spend your time solving problems that haven't been solved instead! 花时间解决尚未解决的问题!
I'm inclined to agree with Rex M, although your second requirement for numerical constraints complicates things. 我倾向于同意Rex M,尽管你对数值约束的第二个要求使事情复杂化。 Unless you only allowed very basic constraints, I'm not aware of a way to succinctly express that in a regular expression. 除非你只允许非常基本的约束,否则我不知道在正则表达式中简洁地表达它的方法。 If there is such a way, please disregard the rest of my answer and follow the other suggestions here. 如果有这样的方式,请忽略我的其余答案并遵循其他建议。 :) :)
You might want to consider a parser generator - things like the classic lex and yacc. 您可能想要考虑一个解析器生成器 - 比如经典的lex和yacc。 I'm not really familiar with the Java choices, but here's a list: 我对Java的选择并不熟悉,但这里有一个列表:
http://java-source.net/open-source/parser-generators http://java-source.net/open-source/parser-generators
If you're not familiar, the standard approach would be to first create a lexer that turns your strings into tokens. 如果您不熟悉,标准方法是首先创建一个将字符串转换为标记的词法分析器 。 Then you would pass those tokens onto a parser that applies your grammar to them and spits out some kind of result. 然后你将这些标记传递给一个解析器,将你的语法应用到它们并吐出某种结果。
In your case, I envision the parser resulting in a combination of a regular expression and additional conditions. 在您的情况下,我设想解析器导致正则表达式和其他条件的组合。 For your numerical constraint example, it might give you the regular expression \\/cal/long/3/4/143:(\\d+)\\
and a constraint to apply to the first grouping (the \\d+
portion) that requires that the number lie between 100 and 1100. You'd then apply the RE to your strings for candidates, and apply the constraint to those candidates to find your matches. 对于您的数值约束示例,它可能会为您提供正则表达式\\/cal/long/3/4/143:(\\d+)\\
以及要应用于需要该数字的第一个分组( \\d+
部分)的约束介于100和1100之间。然后,您可以将RE应用于候选人的字符串,并将约束应用于那些候选人以找到您的匹配项。
It's a pretty complicated approach, so hopefully there's a simpler way. 这是一个非常复杂的方法,所以希望有一种更简单的方法。 I hope that gives you some ideas, at least. 我希望至少能给你一些想法。
The Java constraint is a severe one. Java约束是一个严重的约束。 I would recommend using parsing combinators , but you will have to translate the ideas to Java using classes instead of functions. 我建议使用解析组合器 ,但您必须使用类而不是函数将想法转换为Java。 There are many, many papers available on this topic; 关于这个主题有很多很多论文; one of the easiest to approach is Graham Hutton's Higher-Order Functions for Parsing . 最容易接近的是Graham Hutton的高阶解析函数 。 Hutton's approach makes it especially easy to decide to succeed or fail based on conditions like the magnitude of a number, as you show in your example. Hutton的方法使得根据数字大小等条件决定成功或失败特别容易,如您在示例中所示。
Unfortunately, not all programmers (myself included) are as familiar with RegEx as they ought be. 不幸的是,并非所有程序员(包括我自己)都熟悉RegEx。 This often means we end up writing our own string-parsing logic where RegEx could otherwise have served us well. 这通常意味着我们最终会编写自己的字符串解析逻辑,否则RegEx可以很好地为我们服务。
This isn't always bad. 这并不总是坏事。 It's possible in some cases to write a DSL (a class, a cohesive set of methods) that's more elegant and readable and meets the precise needs of your problem domain. 在某些情况下,可以编写一个更优雅,更易读并满足问题域精确需求的DSL(一类,一组紧密结合的方法)。 The trouble is that it can take dozens of iterations to distill the problem into a DSL that is simple and intuitive. 麻烦的是,它可能需要数十次迭代才能将问题提炼成简单直观的DSL。 And only if the DSL will be used far and wide in the application or by a large community is this trouble warranted. 只有当DSL在应用程序或大型社区中被广泛使用时才会出现这种麻烦。 Don't write a elegant solution to a problem that only appears sporadically. 不要为只偶尔出现的问题写出优雅的解决方案。
If you're going to go the parser route, check out GOLD Parsing System. 如果您要去解析器路线,请查看GOLD Parsing System。 It's often a better option than something like YACC, cleaner looking than pure regexes, and supports Java. 它通常比YACC更好,比纯正的正则表达式更清晰,并且支持Java。
http://goldparser.org/about/how-it-works.htm http://goldparser.org/about/how-it-works.htm
http://java-source.net/open-source/parser-generators and http://catalog.compilertools.net/java.html contain catalogs of tools for this. http://java-source.net/open-source/parser-generators和http://catalog.compilertools.net/java.html包含此目录的工具。 Compare also the stackoverflow question How can I parse code to build a compiler in Java? 比较stackoverflow问题如何解析代码以在Java中构建编译器? . 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.