Python正则表达式-re.search（）与re.findall（）

Question

For school I'm supposed to write a Python RE script that extracts IP addresses. 对于学校，我应该编写一个提取IP地址的Python RE脚本。 The regular expression I'm using seems to work with re.search() but not with re.findall() . 我正在使用的正则表达式似乎适用于re.search()但不适用于re.findall() 。

exp = "(\d{1,3}\.){3}\d{1,3}"
ip = "blah blah 192.168.0.185 blah blah"
match = re.search(exp, ip)
print match.group()

The match for that is always 192.168.0.185, but its different when I do re.findall() 匹配始终是192.168.0.185，但是当我执行re.findall()时， re.findall()有所不同

exp = "(\d{1,3}\.){3}\d{1,3}"
ip = "blah blah 192.168.0.185 blah blah"
matches = re.findall(exp, ip)
print matches[0]

0.

I'm wondering why re.findall() yields 0. when re.search() yields 192.168.0.185, since I'm using the same expression for both functions. 我想知道为什么re.findall()产生0。当re.search()产生192.168.0.185时，因为两个函数都使用相同的表达式。

And what can I do to make it so re.findall() will actually follow the expression correctly? 我应该怎么做才能使re.findall()真正正确地遵循表达式？ Or am I making some kind of mistake? 还是我犯了某种错误？

Answer 1

findall returns a list of matches, and from the documentation: findall返回匹配列表，并从文档中返回：

If one or more groups are present in the pattern, return a list of groups; 如果该模式中存在一个或多个组，则返回一个组列表；否则，返回一个列表。 this will be a list of tuples if the pattern has more than one group. 如果模式包含多个组，则这将是一个元组列表。

So, your previous expression had one group that matched 3 times in the string where the last match was 0. 因此，您的上一个表达式有一个在字符串中最后匹配为0.的字符串中匹配了3次的组0.

To fix your problem use: exp = "(?:\\d{1,3}\\.){3}\\d{1,3}" ; 要解决您的问题，请使用： exp = "(?:\\d{1,3}\\.){3}\\d{1,3}" ； by using the non-grouping version, there is no returned groups so the match is returned in both cases. 通过使用非分组版本，没有返回的分组，因此在两种情况下都返回匹配项。

Answer 2

You're only capturing the 0 in that regex, as it'll be the last one that's caught. 您只会在该正则表达式中捕获0，因为它将是最后捕获的0。

Change the expression to capture the entire IP, and the repeated part to be a non-capturing group: 更改表达式以捕获整个IP，并将重复的部分更改为非捕获组：

In [2]: ip = "blah blah 192.168.0.185 blah blah"

In [3]: exp = "((?:\d{1,3}\.){3}\d{1,3})"

In [4]: m = re.findall(exp, ip)

In [5]: m
Out[5]: ['192.168.0.185']

In [6]:

And if it helps to explain the regex: 如果它有助于解释正则表达式：

In [6]: re.compile(exp, re.DEBUG)
subpattern 1
  max_repeat 3 3
    subpattern None
      max_repeat 1 3
        in
          category category_digit
      literal 46
  max_repeat 1 3
    in
      category category_digit

This explains the subpatterns. 这解释了子模式。 Subpattern 1 is what gets captured by findall. 子模式1是findall捕获的。

Python正则表达式-re.search（）与re.findall（）

问题描述

2 个解决方案

解决方案1
14 2012-01-25 10:24:23

解决方案2
4 2012-01-25 10:24:59

Python正则表达式-re.search（）与re.findall（）

问题描述

2 个解决方案

解决方案1 14 2012-01-25 10:24:23

解决方案2 4 2012-01-25 10:24:59

解决方案1
14 2012-01-25 10:24:23

解决方案2
4 2012-01-25 10:24:59