简体   繁体   English

Python不匹配regexp

[英]Python not matching regexp

>>> pattern = re.compile(r'(.*)\\\\(.*)\\\\(.*)')
>>> m = re.match(pattern, 'string1\string2\string3')
>>> m
>>> 
>>> m.groups
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'

I am trying to match strings with the following format in the regexp above: string1\\string2\\string3 . 我正在尝试在上面的正则表达式中使用以下格式匹配字符串: string1\\string2\\string3

Above is Python's output. 以上是Python的输出。 Why is it not returning the appropriate regexp object? 为什么它没有返回适当的正则表达式对象? Is there anything wrong with my pattern? 我的模式有什么问题吗?

The issue is that in your pattern, you use \\\\\\\\ , which represents two raw backslashes, while in the text to be matched, you use \\s , which is actually no backslashes at all (it's a \\s character). 问题是在你的模式中,你使用\\\\\\\\ ,它代表两个原始反斜杠,而在要匹配的文本中,你使用\\s ,实际上根本没有反斜杠(它是一个\\s字符)。

First, you probably want to make your text a raw string, otherwise Python reads it as the \\s character. 首先,您可能希望将文本设置为原始字符串,否则Python会将其作为\\s字符读取。

re.match(pattern, r'string1\string2\string3')

Second, you need only two consecutive slashes in your pattern, to represent that one backslash: 其次,在模式中只需要两个连续斜杠,以表示一个反斜杠:

pattern = re.compile(r'(.*)\\(.*)\\(.*)')

Finally, rather than m.groups , you want to do m.groups() (call the method). 最后,你想做m.groups() (而不是m.groups ,而不是m.groups Thus, all together your code would look like: 因此,您的代码将如下所示:

pattern = re.compile(r'(.*)\\(.*)\\(.*)')
m = re.match(pattern, r'string1\string2\string3')
m.groups()
# ('string1', 'string2', 'string3')

The problem is that you are trying to escape backslashes inside a raw string. 问题是你试图逃避原始字符串中的反斜杠。 From the Python docs , Python文档中

When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. 当存在'r'或'R'前缀时,字符串中包含反斜杠后面的字符不会发生更改,并且所有反斜杠都保留在字符串中。

This means that all 8 backslashes stay in your regex, and each pair matches a single backslash in your test string. 这意味着所有8个反斜杠都保留在正则表达式中,并且每对反射匹配测试字符串中的单个反斜杠。 The problem is immediately apparent when you visualize it (drag the slider above the test string). 当您可视化时 ,问题立即显现(将滑块拖动到测试字符串上方)。 It can be fixed by replacing your regex with 它可以通过替换你的正则表达式来修复

r'(.*)\\(.*)\\(.*)'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM