简体   繁体   中英

How can I simplify this regex?

Here is a rather complex regex:

^\s*(?:\d{2}|\d{2}\s*\d{2}|\d{2}\s*\d{2}\s*\d{2}|\d{2}\s*\d{2}\s*\d{2}\s*\d{2}|\d{2}\s*\d{2}\s*\d{2}\s*\d{2}\s*\d{2})\s*$

Graphically, it becomes:

正则表达式可视化

How can it be reduced?

I have tried positive lookaheads with no success ( (?=\\d{4})[\\s\\d]+ for example).

Requirements

The regex:

  • Allows from one to five pairs of numbers.
  • Allows zero or more blank characters between pairs of numbers.

Here is a set of valid inputs the regex must match: https://regex101.com/r/hN0pT4/7

Example

// OK                  // NOK
12                     123
1234                   12 345
123456                 123 45 45
12345678               1 2 3 4 5
1234567890             12 34 56 78 90 11
12 34
12 3456
12 34 56 78
12 34 567890

EDIT Solution: https://stackoverflow.com/a/36361240/363573

What about the literal translation of:

pairs of numbers (max 5 pairs) with zero or more spaces between

that is:

^\s*(\d{2}\s*){1,5}\s*$

You can see a demo here .

The shortest/simplest is:

^ *(\d\d *){1,5}$

Notes:

  • \\d\\d (4 chars) is shorter/simpler than \\d{2} (5 chars, with quantifier)
  • space char (1 char) is simpler then \\s (2 chars)
  • you don't need the trailing \\s* because any trailing spaces are consumed by the inner expression

See live demo passing all your posted test cases.

If you really need to allow other whitespace chars (eg tabs), then use:

^\s*(\d\d\s*){1,5}$

Here is the best I could produce

^(\d{2} ?){1,5}$

^\s*(\d{2} *){1,5}\s*$   <--- forgot the whitespaces 0 to n times (edit)

Expl :

^ : Begin of string

(\\d{2} ?) : Matches pairs of digits with an optional

{1,5} : The group can be repeated one to five times (5 pairs max)

$ : End of string


Regex101

Let's break it down:

  • ^\\s*(?:x)\\s*$ is easy enough: start of the input, any whitespace, group x, any whitespace, end - not much to simplyfy here.
  • now group x: \\d{2}|\\d{2}\\s*\\d{2}|\\d{2}\\s*\\d{2}\\s*\\d{2}|\\d{2}\\s*\\d{2}\\s*\\d{2}\\s*\\d{2}|\\d{2}\\s*\\d{2}\\s*\\d{2}\\s*\\d{2}\\s*\\d{2}

    If you split it at the pipes (ie "or"-operators) you get this:
    • \\d{2}
    • \\d{2}\\s*\\d{2}
    • \\d{2}\\s*\\d{2}\\s*\\d{2}
    • \\d{2}\\s*\\d{2}\\s*\\d{2}\\s*\\d{2}
    • \\d{2}\\s*\\d{2}\\s*\\d{2}\\s*\\d{2}\\s*\\d{2}

See a pattern? They all start with \\d{2} and each time there is one more \\s*\\d{2} - up to 4 times. So this can be simplified to \\d{2}(?:\\s*\\d{2}){0,4}

Putting it together you get ^\\s*(?:\\d{2}(?:\\s*\\d{2}){0,4})\\s*$

You can try this

^\s*((?:\d{2}\s*){1,5})$

Explanation as per comment ( Regex Breakdown )

^ #Starting of string
 \s* #Consume any spaces from starting
 (    #Capturing group to capture the whole string if it matches the below requirements (It is not necessary to use it if you are only matching the string)
   (?:\d{2}\s*){1,5} #Non capturing group to check the pattern
 )
$ #End of string

Regex Demo

Here's yet another way:

(\d\s*\d\s*){1,5}

Correctly matches all of the OP's examples:

match 12
match 1234
match 123456
match 12345678
match 1234567890
match 12 34
match 12 3456
match 12 34 56 78
match 12 34 567890
no match 123
no match 12 345
no match 123 45 45
no match 1 2 3 4 5

In your situation, pattern repeat is required. You could try:

^(\s*\d{2}(?:[^\S\n]*\d{2}){0,4}\s*)$

REGEX 101 DEMO

Here is the final solution I have selected:

^(?:\s*\d{2}){1,5}$

正则表达式可视化

Thank you all guys !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM