简化复杂的正则表达式

Question

I am looking for a way to simplify a regular expression which consists of values (eg 12345), relation signs (<,>,<=,>=) and junctors (&,!). 我正在寻找一种简化正则表达式的方法，该正则表达式由值（例如12345），关系符号（<，>，<=，> =）和junctors（＆，！）组成。 Eg the expression: 例如表达式：

>= 12345 & <=99999 & !55555

should be matched. 应该匹配。 I have this regular expression: 我有这个正则表达式：

(^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

I am especially unhappy with the repetition of <=, >=, <, > at the beginning and end of the expression. 我特别不满意在表达式的开头和结尾重复<=，> =，<，>。 I would be glad to get a hint how to make it simpler eg look ahead, look back. 我很乐意得到一个提示如何使其变得更简单，例如向前看，回顾一下。

Answer 1

Starting from your regex, you can do this simplification steps: 从正则表达式开始，您可以执行以下简化步骤：

 (^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

Move the anchor out of the alternation 将锚移出交替
```
 ^(<=|<= |>= |>= |<|>|< |> |)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
```
Why has there been whitespace before the anchor? 为什么在锚之前有空白？ (removed that) （删除）
Move the following whitespace outside and make it optional 将以下空格移到外面并使其成为可选项
```
 ^(<=|<=|>=|>=|<|>|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
```

Remove the duplicates in the alternations 删除更改中的重复项

 ^(<=|>=|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

The empty alternative at the end would match the empty string ==> this alternation is optional 最后的空替代将匹配空字符串==>此替换是可选的
```
 ^((<=|>=|<|>)? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
```

Make the equal sign optional and remove the duplicates 使等号可选并删除重复项

 ^((<|>)=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

The alternation with single characters can be replaced with a character class 单个字符的替换可以用字符类替换
```
 ^([<>]=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
```
Do similar things with the alternation at the end and you end up with something like this: 做最后的交替做类似的事情，你最终得到这样的东西：
```
 ^([<>]=? ?)?((!|)([0-9]{1,5}))( ?(& ?([<>]=?)?)?|$) 
```

This is untested, I did not change the semantic (I think so), but I did this just here in the editor. 这是未经测试的，我没有改变语义（我想是这样），但我只是在编辑器中这样做了。

Answer 2

You can make all the spaces optional (with question marks) so you don't have to explicitly list all the possibilities. 您可以将所有空格设置为可选（带有问号），这样您就不必明确列出所有可能性。 Also you can group the equality/inequality symbols in a character set ([ ]). 您还可以将字符集（[]）中的相等/不等号符号分组。

Like this, I think 我想是这样的

(^[<>]=?\s?)((!|)([0-9]{1,5}))(\s?&\s?[<>]=?\s|$)*

Answer 3

How about 怎么样

[<>]=?|\\d{1,5}|[&!\\|]

That takes care of your > / >= / < / <= repetition. 这会照顾你的> /> = / </ <=重复。 Seems to work for me. 似乎为我工作。

Let me know if this answers your question, or needs work. 如果这回答了您的问题，或者需要工作，请告诉我。

Answer 4

I have a two-step procedure in mind. 我有一个两步程序。 First break by junctor, then check individual parts. 首先由junctor打破，然后检查各个部分。

final String expr = ">= 12345 & <=99999 & !55555".replaceAll("\\s+", "");
for (String s : expr.split("[|&]"))
  if (!s.matches("([<>]=?|=|!)?\\d+")) { System.out.println("Invalid"); return; }
System.out.println("Valid");

But we're still left guessing whether you are talking about validation or something else. 但我们仍然猜测你是在谈论验证还是别的什么。

Answer 5

you seem to be spending a lot of effort matching optional spaces. 你似乎花了很多精力来匹配可选空间。 something like \\s? 像\\s? (0 - 1) or \\s* (0 - many) would be better. （0 - 1）或\\s* （0 - 很多）会更好。

also, repeated items separated by something are always difficult. 此外，由某物分隔的重复物品总是很困难。 it's best to make a regexp for the "thing" to simplify the repetition. 最好为“事物”制作一个regexp来简化重复。

limit = '\s*([<>]=?|!)\s*\d{1,5}\s*'
one_or_more = '^' + limit + '(&' + limit + ')*$'

or, expanded out: 或者，扩展出来：

^\s*([<>]=?|!)\s*\d{1,5}\s*(&\s*([<>]=?|!)\s*\d{1,5}\s*)*$

also, ! 还， ! is a "relation sign" and not a "junctor" if i am understanding correctly. 如果我理解正确，它是一个“关系标志”，而不是“连接符”。

(for the people advocating using a "real" parser, the above - the structure of one_or_more - is probably how you would end up implementing the &-separated list; there's no need for a parser if you can just use string concatenation in the language). （对于提倡使用“真正的”解析器的人来说，上面 - one_or_more的结构 - 可能就是你最终实现＆-separated列表的方式;如果你只能在语言中使用字符串连接就不需要解析器）。

Answer 6

This is what you want: 这就是你想要的：

^(\s*([<>]=?)?\s*!?\d{1,5}\s*(&|$))*

These explanations of sum sub expressions should help you understand the whole thing: sum子表达式的这些解释应该有助于您理解整个事物：

\\s* : 0 or more spaces \\s* ：0或更多空格
([<>]=?)? : A < or > sign optionally followed by an = , all optional ：一个<或>符号后跟一个= ，全部是可选的
!? : And optional ! ：可选!
\\d{1,5} : 1-5 digits \\d{1,5} ：1-5位数
(&|$) : Either an & or the end of the string (&|$) ：字符串的&或结尾

简化复杂的正则表达式

问题描述

6 个解决方案

解决方案1
1 2012-05-23 21:25:03

解决方案2
0 2012-05-23 20:33:32

解决方案3
0 2012-05-23 20:33:46

解决方案4
0 2012-05-23 20:39:14

解决方案5
0 2012-05-23 20:50:18

解决方案6
0 2012-05-23 20:56:45

简化复杂的正则表达式

问题描述

6 个解决方案

解决方案1 1 2012-05-23 21:25:03

解决方案2 0 2012-05-23 20:33:32

解决方案3 0 2012-05-23 20:33:46

解决方案4 0 2012-05-23 20:39:14

解决方案5 0 2012-05-23 20:50:18

解决方案6 0 2012-05-23 20:56:45

解决方案1
1 2012-05-23 21:25:03

解决方案2
0 2012-05-23 20:33:32

解决方案3
0 2012-05-23 20:33:46

解决方案4
0 2012-05-23 20:39:14

解决方案5
0 2012-05-23 20:50:18

解决方案6
0 2012-05-23 20:56:45