简体   繁体   中英

Finding specific pattern of numbers using regular expression in python

I am trying to extract a specific pattern of numbers using regular expression in Python 3.7. Below are the 4 possible patterns.

Pattern 1 - The length of this pattern is exactly 10 and cannot start with a zero. These consist of only integers. Ex: '1234567890'

Pattern 2 - The length of this pattern is exactly 11 and can start with a zero. These consist of only integers. Ex: '01234567890'

Pattern 3 - The length of this pattern is exactly 11 and cannot start with a zero. There is one space after the 5th number and all other characters are numbers. Ex: '12345 67890'

Pattern 4 - The length of this pattern is exactly 12 and can start with a zero. There is one space after the 6th number and all other characters are numbers. Ex: '012345 67890'

Note - The example pattern example provided is for representation only. The actual set of numbers in my string can be anything. Example: '2345653340' or '034945 85730' or '000000 00000' or '09876543210'.

Below is what I have been trying to attempt. For some reason, they are not returning the desired results. How do I go about this?

import re

regex = re.compile(r"(\d)?\d\d\d\d\d(\b)?\d\d\d\d\d")

number1 = regex.findall("number is 1234567890") # For Pattern 1 expected output is '1234567890'
number2 = regex.findall("number is 01234567890") # For Pattern 2 expected output is '01234567890'
number3 = regex.findall("number is 12345 67890") # For Pattern 3 expected output is '12345 67890'
number4 = regex.findall("number is 012345 67890") # For Pattern 4 expected output is '012345 67890'

Regex101 ( link ):

import re

l = ["number is 1234567890",
"number is 01234567890",
"number is 12345 67890",
"number is 012345 67890",

"number is 912345 67890 - dont match",
"number is 02345 67890 - dont match",
"number is 91234567890 - dont match",
"number is 0234567890 - dont match"]

for s in l:
    m = re.findall(r'\b0\d{5}\s\d{5}\b|\b[1-9]\d{4}\s\d{5}\b|\b0\d{10}\b|\b[1-9]\d{9}\b', s)
    print(m)

Prints:

['1234567890']
['01234567890']
['12345 67890']
['012345 67890']
[]
[]
[]
[]

You could use and alternation to match the different requirements. You could use a word boundary \\b to prevent the number being part of a larger word.

\b(?:\d{6} \d{5}|[1-9]\d{4} \d{5}|[1-9]\d{9}|\d{11})\b
  • \\b word boundary
  • (?: Non capturing group
    • \\d{6} \\d{5} Pattern 4 6 times 0-9, space 5 times 0-9
    • | Or
    • [1-9]\\d{4} \\d{5} Pattern 3 1 time 1-9, 4 times 0-9, space, 5 times 0-9
    • | Or
    • [1-9]\\d{9} Pattern 1 1 times 1-9, 9 times 0-9
    • | Or
    • \\d{11} Pattern 2 11 times 0-9
  • ) Close group
  • \\b Word boundary

Regex demo | Python demo

Between all the regexes given til now, this one seems the easiest to write and fastest to run :

from re import compile
regex = compile(r'\d{11}|[1-9]\d{9}|[1-9]\d{4}\s\d{5}|\d{6}\s\d{5}')
number1 = regex.findall("number is 1234567890")
number2 = regex.findall("number is 01234567890")
number3 = regex.findall("number is 12345 67890") 
number4 = regex.findall("number is 012345 67890")

You get the expected results:

>>> number1
'1234567890'
>>> number2
'01234567890'
>>> number3
'12345 67890'
>>> number4
'012345 67890'

Answer from Andrej Kesely does: 80 steps. regex101.com
Answer from The fourth bird does: 44 steps. regex101.com
My answer does: 41 steps. regex101.com .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM