简体   繁体   English

将正则表达式匹配到整个字符串,而不只是字符串的一部分

[英]Match a regex to the whole string and not just a part of the string

I have a regex: r'((\\+91|0)?\\s?\\d{10})' 我有一个正则表达式: r'((\\+91|0)?\\s?\\d{10})'

I'm trying to match numbers like +91 1234567890 , 1234567790 , 01234567890 . 我试图像匹配的数字+91 1234567890123456779001234567890

These numbers shouldn't be matched: 1234568901112 because it doesn't start with +91 or 0 or doesn't have just 10 numbers: 这些数字不应该匹配: 1234568901112因为它不是以+91或0开头,或者不是只有10个数字:

When I try to use re.findall() : 当我尝试使用re.findall()

re.findall(r'((\+91|0)?\s?\d{10})', '+91 1234567890, 1234567790, 01234567890, 1234568901112')
[('+91 1234567890', '+91'),
 (' 1234567790', ''),
 (' 0123456789', ''),
 (' 1234568901', '')]

You can notice that in the third and fourth index the output is not what I want. 您会注意到,在第三和第四索引中,输出不是我想要的。 My expected output at third index is 01234568890 and because it starts with 0 and followed by 10 characters. 我在第三个索引处的预期输出是01234568890,因为它以0开头,后跟10个字符。 But it's only showing the first 10 characters. 但是它只显示前10个字符。 Also I don't want the output in the 4th index because it the number doesn't completely match. 我也不想在第四个索引中输出,因为它的数量不完全匹配。 So either it matched the complete word/string else it is invalid. 因此,要么匹配完整的单词/字符串,要么无效。

Is there any other regex that I can use? 我还可以使用其他正则表达式吗? Or a function? 还是功能? What am I doing wrong here? 我在这里做错了什么?

The expected output is: 预期的输出是:

[('+91 1234567890','1234567790', '01234567890']

Please let me know if any more clarifications are needed. 请让我知道是否需要更多说明。

You may use 您可以使用

r'(?<!\w)(?:(?:\+91|0)\s?)?\d{10}\b'

See the regex demo . 参见regex演示

The point is to match these patterns as whole words, the problem is that the first part is optional and one of the optional alteratives starts with a non-word char, so a single \\b word boundary won't work here. 关键是要将这些模式作为整个单词进行匹配,问题在于第一部分是可选的,而可选的替代单词中的一个以非单词char开头,因此单个\\b单词边界在这里将不起作用。

Details 细节

  • (?<!\\w) - there should be no word char immediately to the left of the current location (?<!\\w) -当前位置的左边不应有任何字符char
  • (?:(?:\\+91|0)\\s?)? - an optional occurrence of -的可选事件
    • (?:\\+91|0) - +91 or 0 (?:\\+91|0) - +910
    • \\s? - an optional whitespace -可选的空格
  • \\d{10}\\b - ten digits matches as a whole word, no word chars allowed on both sides \\d{10}\\b整个单词十位数匹配,两边都不允许有字符字符

Python demo : Python演示

import re
s = '+91 1234567890, 1234567790, 012345678900, 1234568901112, 01234567890'
print(re.findall(r'(?<!\w)(?:(?:\+91|0)\s?)?\d{10}\b', s))
# => ['+91 1234567890', '1234567790', '01234567890']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM