简体   繁体   English

简化复杂的正则表达式

[英]Simplify a complex regular expression

I am looking for a way to simplify a regular expression which consists of values (eg 12345), relation signs (<,>,<=,>=) and junctors (&,!). 我正在寻找一种简化正则表达式的方法,该正则表达式由值(例如12345),关系符号(<,>,<=,> =)和junctors(&,!)组成。 Eg the expression: 例如表达式:

>= 12345 & <=99999 & !55555 

should be matched. 应该匹配。 I have this regular expression: 我有这个正则表达式:

(^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

I am especially unhappy with the repetition of <=, >=, <, > at the beginning and end of the expression. 我特别不满意在表达式的开头和结尾重复<=,> =,<,>。 I would be glad to get a hint how to make it simpler eg look ahead, look back. 我很乐意得到一个提示如何使其变得更简单,例如向前看,回顾一下。

Starting from your regex, you can do this simplification steps: 从正则表达式开始,您可以执行以下简化步骤:

 (^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
  1. Move the anchor out of the alternation 将锚移出交替

     ^(<=|<= |>= |>= |<|>|< |> |)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 

    Why has there been whitespace before the anchor? 为什么在锚之前有空白? (removed that) (删除)

  2. Move the following whitespace outside and make it optional 将以下空格移到外面并使其成为可选项

     ^(<=|<=|>=|>=|<|>|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  3. Remove the duplicates in the alternations 删除更改中的重复项

     ^(<=|>=|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  4. The empty alternative at the end would match the empty string ==> this alternation is optional 最后的空替代将匹配空字符串==>此替换是可选的

     ^((<=|>=|<|>)? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  5. Make the equal sign optional and remove the duplicates 使等号可选并删除重复项

     ^((<|>)=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  6. The alternation with single characters can be replaced with a character class 单个字符的替换可以用字符类替换

     ^([<>]=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  7. Do similar things with the alternation at the end and you end up with something like this: 做最后的交替做类似的事情,你最终得到这样的东西:

     ^([<>]=? ?)?((!|)([0-9]{1,5}))( ?(& ?([<>]=?)?)?|$) 

This is untested, I did not change the semantic (I think so), but I did this just here in the editor. 这是未经测试的,我没有改变语义(我想是这样),但我只是在编辑器中这样做了。

You can make all the spaces optional (with question marks) so you don't have to explicitly list all the possibilities. 您可以将所有空格设置为可选(带有问号),这样您就不必明确列出所有可能性。 Also you can group the equality/inequality symbols in a character set ([ ]). 您还可以将字符集([])中的相等/不等号符号分组。

Like this, I think 我想是这样的

(^[<>]=?\s?)((!|)([0-9]{1,5}))(\s?&\s?[<>]=?\s|$)*

How about 怎么样

[<>]=?|\\d{1,5}|[&!\\|]

That takes care of your > / >= / < / <= repetition. 这会照顾你的> /> = / </ <=重复。 Seems to work for me. 似乎为我工作。

Let me know if this answers your question, or needs work. 如果这回答了您的问题,或者需要工作,请告诉我。

I have a two-step procedure in mind. 我有一个两步程序。 First break by junctor, then check individual parts. 首先由junctor打破,然后检查各个部分。

final String expr = ">= 12345 & <=99999 & !55555".replaceAll("\\s+", "");
for (String s : expr.split("[|&]"))
  if (!s.matches("([<>]=?|=|!)?\\d+")) { System.out.println("Invalid"); return; }
System.out.println("Valid");

But we're still left guessing whether you are talking about validation or something else. 但我们仍然猜测你是在谈论验证还是别的什么。

you seem to be spending a lot of effort matching optional spaces. 你似乎花了很多精力来匹配可选空间。 something like \\s? \\s? (0 - 1) or \\s* (0 - many) would be better. (0 - 1)或\\s* (0 - 很多)会更好。

also, repeated items separated by something are always difficult. 此外,由某物分隔的重复物品总是很困难。 it's best to make a regexp for the "thing" to simplify the repetition. 最好为“事物”制作一个regexp来简化重复。

limit = '\s*([<>]=?|!)\s*\d{1,5}\s*'
one_or_more = '^' + limit + '(&' + limit + ')*$'

or, expanded out: 或者,扩展出来:

^\s*([<>]=?|!)\s*\d{1,5}\s*(&\s*([<>]=?|!)\s*\d{1,5}\s*)*$

also, ! 还, ! is a "relation sign" and not a "junctor" if i am understanding correctly. 如果我理解正确,它是一个“关系标志”,而不是“连接符”。

(for the people advocating using a "real" parser, the above - the structure of one_or_more - is probably how you would end up implementing the &-separated list; there's no need for a parser if you can just use string concatenation in the language). (对于提倡使用“真正的”解析器的人来说,上面 - one_or_more的结构 - 可能就是你最终实现&-separated列表的方式;如果你只能在语言中使用字符串连接就不需要解析器)。

This is what you want: 这就是你想要的:

^(\s*([<>]=?)?\s*!?\d{1,5}\s*(&|$))*

These explanations of sum sub expressions should help you understand the whole thing: sum子表达式的这些解释应该有助于您理解整个事物:

\\s* : 0 or more spaces \\s* :0或更多空格
([<>]=?)? : A < or > sign optionally followed by an = , all optional :一个<>符号后跟一个= ,全部是可选的
!? : And optional ! :可选!
\\d{1,5} : 1-5 digits \\d{1,5} :1-5位数
(&|$) : Either an & or the end of the string (&|$) :字符串的&或结尾

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM