简体   繁体   English

电话号码上的Python正则表达式

[英]Python regex on phone numbers

I'm using the following regex to match phone numbers (still developing, so not comprehensive): 我正在使用以下正则表达式来匹配电话号码(仍在开发中,因此并不全面):

\(?\+[\d _\-\.\)\(\+]{8,25}[\d]{1}

When I test it with regex101, or regexpal.com, it matches both +442032398869 and +1 (888) 2572054. 当我使用regex101或regexpal.com测试它时,它会匹配+442032398869和+1(888)2572054。

However, when I run it with my Python script, +442032398869 does not match. 但是,当我使用Python脚本运行它时,+ 442032398869不匹配。 Any reason for this, and how may I fix it? 任何原因,我该如何解决?

Bonus question: according to my readings, I should have to do as many escapes inside the first character set. 额外的问题:根据我的阅读,我应该在第一个字符集中进行尽可能多的转义。 Any reason why Python's re throws me an exception if I remove the backslash in front of the . 如果我删除了。前面的反斜杠,那么Python re的任何原因都会抛出异常。 or + for instance? 或+例如?

EDIT: 编辑:

def get_numbers_in_text(html_string): 
    pattern = r'\(?\+[\d _\-\.\)\(\+]{8,25}[\d]{1}
    reg = re.compile(pattern,re.IGNORECASE) 
    numbers = reg.findall(text) 
    return numbers 

The two numbers are in two different HTML files, so I call the function twice, once for each HTML file / number. 这两个数字位于两个不同的HTML文件中,因此我两次调用该函数,每个HTML文件/数字一次。

Your regex works: 您的正则表达式有效:

>>> s = 'blah +442032398869 blah +1 (888) 2572054blah'
>>> re.findall(r'\(?\+[\d _\-\.\)\(\+]{8,25}[\d]{1}', s)
['+442032398869', '+1 (888) 2572054']

Your code indicates that you are trying to match numbers in html text. 您的代码表明您正在尝试匹配html文本中的数字。 Perhaps there is markup separating portions of the number you are trying to match. 也许有一些标记将您要匹配的数字分开。 Or perhaps the plus symbol is actually a unicode full-width plus (U+FF0B). 也许加号实际上是Unicode全角加号(U + FF0B)。 Or something else like it. 或其他类似的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM