简体   繁体   English

用于匹配电话号码的 Python 正则表达式

[英]Python regex for matching phone numbers

I am newbie to Python regex , I need your help.我是 Python regex 的新手,我需要你的帮助。 I'm using the following regex to match phone numbers:我正在使用以下正则表达式来匹配电话号码:

r'^\s*\(?([0-9]{3})[\)\-][\s]*?([0-9]{3})[-]?([0-9]{4})\s*$'

Apprantly it does pass most of cases but it does not fail some of the formats. Apprantly 它确实通过了大多数情况,但它不会使某些格式失败。 Could someone clarify what I am doing wrong?有人可以澄清我做错了什么吗? I guess there is something wrong with space and parenthesis format in the first part of number.我猜数字的第一部分中的空格和括号格式有问题。

It should pass these formats它应该通过这些格式

'(404) 666-1111'
'(404) 6661111'
'404-666-1111'
'404-6661111'
'404666-1111'
'4046661111'
'  (404)   666-1111  '
'(404)666-1111  '
'  404-666-1111 '
'  404-6661111 '
' 4046661111'

and fail at these:并在这些方面失败

'+1 (404) 666-1111'
' ( 404)666-1111'
'404.666.1111'
'404 666-1111'
'404 666 1111'
'(404-666-1111'

The key challenge is making sure the parentheses match.关键的挑战是确保括号匹配。 Given the regular expressions can't count openers and closers, the usual way to do that part is provide two pattern alternative, one with parentheses and one without:鉴于正则表达式不能计算开头和结尾,执行该部分的常用方法是提供两种模式替代方案,一种带括号,另一种不带:

>>> bool(re.match(r'(\(\d{3}\))|\d{3}', '404'))
True
>>> bool(re.match(r'(\(\d{3}\))|\d{3}', '(404)'))
True
>>> bool(re.match(r'(\(\d{3}\))|\d{3}', '(404'))
False

The reason the whole numbers without any parenthesis or hyphen do not match is because at the beginning of the pattern, after 3 digits there is either a ) or - expected没有任何括号或连字符的整数不匹配的原因是因为在模式的开头,在 3 位数字之后有一个)-预期

^\s*\(?([0-9]{3})[)-]
                 ^^^^   

To match and not match all the examples, you might use要匹配和不匹配所有示例,您可以使用

 ^\s*(?:\(\d{3}\)\s*|\d{3})-?\d{3}-?\d{4}\s*$

Explantion解释

  • ^ Start of string ^字符串开始
  • \\s* Match 0+ whitespace chars \\s*匹配 0+ 个空白字符
  • (?: Non capture group (?:非捕获组
    • \\(\\d{3}\\)\\s* Match 3 digits between parenthesis and optional whitespace chars \\(\\d{3}\\)\\s*匹配括号和可选空白字符之间的 3 位数字
    • | Or或者
    • \\d{3} Match 3 digits \\d{3}匹配 3 位数字
  • ) Close group )关闭群组
  • -?\\d{3}-? Match 3 digits between optional hyphens匹配可选连字符之间的 3 位数字
  • \\d{4}\\s* Match 4 digits and optional whitespace chars \\d{4}\\s*匹配 4 位数字和可选的空白字符
  • $ End of string $字符串结尾

Regex demo正则表达式演示

Note that \\s could also match a newline请注意\\s也可以匹配换行符

what you can do is to get rid off " " and then do regex try with "|"你可以做的是摆脱“”然后用“|”做正则表达式it means or Please note that this RegEx is incomplete, but the ide is that you can add more |这意味着或请注意,这个 RegEx 是不完整的,但想法是您可以添加更多 | in to this:对此:

import re
numbersList=['(404) 666-1111' ,'(404) 6661111', '404-666-1111', '404-6661111' ,'404666-1111', '4046661111', '  (404)   666-1111  ' ,'(404)666-1111  ' ,'  404-666-1111 ', '  404-6661111 ', ' 4046661111' ]
regExStr="^\(\d{3}\) *\d+\-*\d+|^ *\d+\-\d+|^ *\d+|^ *\(\d{3}\)\d+\-\d+"
for number in numbersList:
    tmpStr=number.replace(" ", "")
    result = re.findall(regExStr, tmpStr)
    print(result)
    print("orig: " + number)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM