[英]Matching phone numbers, regex
I've phone numbers in this format: 我用这种格式的电话号码:
some_text phone_number some_text
some_text (888) 501-7526 some_text
Which is a more pythonic way way to search for the phone numbers 这是搜索电话号码的更加pythonic方式
(\(\d\d\d\) \d\d\d-\d\d\d\d)
(\([0-9]+\) [0-9]+-[0-9]+)
or there is a simpler expresion to do this? 还是有一个更简单的表达来做到这一点?
I think you are looking for something like this: 我想你正在寻找这样的东西:
(\(\d{3}\) \d{3}-\d{4})
From the Python docs : 从Python文档 :
{m} {M}
Specifies that exactly m copies of the previous RE should be matched; 指定应匹配前一个RE的m个副本; fewer matches cause the entire RE not to match. 较少的匹配导致整个RE不匹配。 For example, a{6} will match exactly six 'a' characters, but not five. 例如,{6}将恰好匹配六个'a'字符,但不匹配五个字符。
(\\(\\d\\d\\d\\) \\d\\d\\d-\\d\\d\\d\\d)
would also work, but, as you said in your question, is rather repetitive. (\\(\\d\\d\\d\\) \\d\\d\\d-\\d\\d\\d\\d)
也可以使用,但是,正如您在问题中所说,这是相当重复的。 Your other suggested pattern, (\\([0-9]+\\) [0-9]+-[0-9]+)
, gives false positives on input such as (1) 2-3
. 您的其他建议模式(\\([0-9]+\\) [0-9]+-[0-9]+)
会在输入上给出误报,例如(1) 2-3
。
Using (\\(\\d{3}\\)\\s*\\d{3}-\\d{4})
使用(\\(\\d{3}\\)\\s*\\d{3}-\\d{4})
>>> import re
>>> s = "some_text (888) 501-7526 some_text"
>>> pat = re.compile(r'(\(\d{3}\)\s*\d{3}-\d{4})')
>>> pat.search(s).group()
'(888) 501-7526'
Explanation: 说明:
(\\(\\d{3}\\)\\s*\\d{3}-\\d{4})/
(\\(\\d{3}\\)\\s*\\d{3}-\\d{4})
第一个捕获组(\\(\\d{3}\\)\\s*\\d{3}-\\d{4})
\\(
matches the character (
literally \\(
匹配字符(
字面意思) \\d{3}
match a digit [0-9]
\\d{3}
匹配一个数字[0-9]
{3}
Exactly 3 times 量词: {3}
恰好3次 \\)
matches the character )
literally \\)
)
字面上匹配字符 \\s*
match any white space character [\\r\\n\\t\\f ]
\\s*
匹配任何空格字符[\\r\\n\\t\\f ]
*
Between zero and unlimited times, as many times as possible, giving back as needed [greedy] 量词: *
在零和无限次之间,尽可能多次,根据需要回馈[贪心] \\d{3}
match a digit [0-9] Quantifier: {3}
Exactly 3 times \\d{3}
匹配数字[0-9]量词: {3}
恰好3次 -
matches the character - literally -
匹配角色 - 字面意思 \\d{4}
match a digit [0-9] Quantifier: {4}
Exactly 4 times \\d{4}
匹配数字[0-9]量词: {4}
恰好4次 I think the second one would be the more pythonic way. 我认为第二个将是更加pythonic的方式。 The one above isn't that easy to read, but regular expressions aren't that intuitive at all. 上面的那个并不容易阅读,但正则表达式根本就不那么直观。
(\\([0-9]+\\) [0-9]+-[0-9]+)
will do it, if the lenght of the phone number is not specified. 如果未指定电话号码的长度, (\\([0-9]+\\) [0-9]+-[0-9]+)
将执行此操作。 If the length is always the same, you can use (\\([0-9]{3}\\) [0-9]{3}-[0-9]{4})
or (\\(\\d{3}\\) \\d{3}-\\d{4})
. 如果长度始终相同,则可以使用(\\([0-9]{3}\\) [0-9]{3}-[0-9]{4})
或(\\(\\d{3}\\) \\d{3}-\\d{4})
(\\([0-9]{3}\\) [0-9]{3}-[0-9]{4})
(\\(\\d{3}\\) \\d{3}-\\d{4})
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.