简体   繁体   中英

How to efficiently match regex in python

I am writing a code to match the US phone number format

So it should match:

123-333-1111
(123)111-2222
123-2221111

But should not match 1232221111

matchThreeDigits = r"(?:\s*\(?[\d]{3}\)?\s*)"
matchFourDigits = r"(?:\s*[\d]{4}\s*)"
phoneRegex = '('+ '('+  matchThreeDigits + ')' + '-?' +   '('+  matchThreeDigits + ')' + '-?' + '(' + matchFourDigits + ')' +')';
matches = re.findall(re.compile(phoneRegex),line)

The problem is I need to make sure at least one of () or '-' is present in present in the pattern (or else it can be a nine digit number rather than a phone number). I don't want to do another pattern search for efficiency reasons. Is there any way to accommodate this information in the regex pattern itself.

You can use the following regex:

regex = r'(?:\d{3}-|\(\d{3}\))\d{3}-?\d{4}'

assuming that (123)1112222 is acceptable.

The | acts as an or, and \\( and \\) escape ( and ) , respectively.

Something like this?

pattern = r'(\(?(\d{3})\)?(?P<A>-)?(\d{3})(?(A)-?|-)(\d{4}))'

Using it:

import re
regex = re.compile(pattern)
check = ['123-333-1111', '(123)111-2222', '123-2221111', '1232221111']
for number in check:
    match = regex.match(number)
    print number, bool(match)
    if match:
        # show the numbers
        print 'nums:', filter(lambda x: x and x.isalnum(), match.groups())

>>> 
123-333-1111 True
nums: ('123', '333', '1111')
(123)111-2222 True
nums: ('123', '111', '2222')
123-2221111 True
nums: ('123', '222', '1111')
1232221111 False

Note:

You requested an explanation of: (?P<A>-) and (?(A)-?|-)

  • (?P<A>-) : Is a named capture group with the name A , (?P<NAME> ... )
  • (?(A)-?|-) : Is a group that checks if the named group A captured something or not, if so it does the YES, else it does the NO capture. (?(NAME)YES|NO)

All this can be easily learned if you do a simple help(re) in the Python interpreter, or a Google search for Python Regular Expressions....

import re
phoneRegex = re.compile("(\({0,1}[\d]{3}\)(?=[\d]{3})|[\d]{3}-)([\d]{3}[-]{0,1}[\d]{4})")
numbers = ["123-333-1111", "(123)111-2222", "123-2221111", "1232221111", "(123)-111-2222"]
for number in numbers:
    print bool(re.match(phoneRegex, number))

Output

True
True
True
False
False

You can see an explanation to this regular expression here : http://regex101.com/r/bA4fH8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM