为什么 re.findall() 给我的结果与 Python 中的 re.finditer() 不同？

Question

I wrote up this regular expression:我写了这个正则表达式：

p = re.compile(r'''
\[\[            #the first [[
[^:]*?          #no :s are allowed
.*?             #a bunch of chars
(
\|              #either go until a |
|\]\]           #or the last ]]
)
                ''', re.VERBOSE)

I want to use re.findall to get all the matching sections of some string.我想使用re.findall来获取某个字符串的所有匹配部分。 I wrote some test code, but it gives me bizarre results.我写了一些测试代码，但它给了我奇怪的结果。

This code这段代码

g = p.finditer('   [[Imae|Lol]]     [[sdfef]]')
print g
for elem in g:
    print elem.span()
    print elem.group()

gives me this output:给我这个 output：

(3, 10)
[[Imae|
(20, 29)
[[sdfef]]

Makes perfect sense right?完全有道理吗？ But when I do this:但是当我这样做时：

h = p.findall('   [[Imae|Lol]]     [[sdfef]]')
for elem in h:
    print elem

the output is this: output 是这样的：

|
]]

Why isn't findall() printing out the same results as finditer??为什么 findall() 打印出的结果与 finditer 不一样？

Answer 1

Findall returns a list of matching groups. Findall 返回匹配组的列表。 The parantheses in your regex defines a group that findall thinks you want, but you don't want groups.正则表达式中的括号定义了 findall 认为您想要的组，但您不想要组。 (?:...) is a non-capturing paranthesis. (?:...)是一个非捕获括号。 Change your regex to:将您的正则表达式更改为：

'''
\[\[            #the first [[
[^:]*?          #no :s are allowed
.*?             #a bunch of chars
(?:             #non-capturing group
\|              #either go until a |
|\]\]           #or the last ]]
)
                '''

Answer 2

When you give re.findall() a regex with groups (parenthesized expressions) in it, it returns the groups that match.当你给re.findall()一个带有组（括号表达式）的正则表达式时，它会返回匹配的组。 Here, you've only got one group, and it's the |在这里，您只有一个组，它是 | or ]] at the end.或 ]] 结尾。 On the other hand, in the code where you use re.finditer(), you're asking for no group in particular, so it gives you the entire string.另一方面，在您使用 re.finditer() 的代码中，您没有特别要求任何组，因此它为您提供了整个字符串。

You can get re.findall() to do what you want by putting parentheses around the whole regex -- or just around the part you're actually trying to extract.你可以让 re.findall() 做你想做的事，方法是在整个正则表达式周围加上括号 - 或者只是在你实际尝试提取的部分周围。 Assuming you're trying to parse wiki links, that would be the "bunch of chars" in line 4. For example,假设您正在尝试解析 wiki 链接，那将是第 4 行中的“一堆字符”。例如，

p = re.compile(r'''
\[\[            #the first [[
[^:]*?          #no :s are allowed
(.*?)           #a bunch of chars
(
\|              #either go until a |
|\]\]           #or the last ]]
)
                ''', re.VERBOSE)

p.findall('   [[Imae|Lol]]     [[sdfef]]')

returns:返回：

[('Imae', '|'), ('sdfef', ']]')]

Answer 3

I think the key bit from the findall() documentation is this:我认为findall()文档的关键是：

If one or more groups are present in the pattern, return a list of groups;如果模式中存在一个或多个组，则返回组列表； this will be a list of tuples if the pattern has more than one group.如果模式有多个组，这将是一个元组列表。

Your regex has a group around the pipe or closing ]] here:您的正则表达式在 pipe 或关闭 ]] 周围有一个组：

(
\|              #either go until a |
|\]\]           #or the last ]]
)

finditer() doesn't appear to have any such clause. finditer()似乎没有任何这样的子句。

Answer 4

They don't return the same thing.他们不返回相同的东西。 Some snippets from the docs :文档中的一些片段：

findall returns a list of strings. findall返回一个字符串列表。 If one or more groups are present in the pattern, return a list of groups;如果模式中存在一个或多个组，则返回组列表； this will be a list of tuples if the pattern has more than one group.如果模式有多个组，这将是一个元组列表。

finditer returns an iterator yielding MatchObject instances. finditer返回一个产生 MatchObject 实例的迭代器。

Answer 5

From the python documentation:从 python 文档：

Return all non-overlapping matches of pattern in string, as a list of strings.返回字符串中模式的所有非重叠匹配，作为字符串列表。 The string is scanned left-to-right, and matches are returned in the order found.从左到右扫描字符串，并按找到的顺序返回匹配项。 If one or more groups are present in the pattern, return a list of groups;如果模式中存在一个或多个组，则返回组列表； this will be a list of tuples if the pattern has more than one group.如果模式有多个组，这将是一个元组列表。 Empty matches are included in the result unless they touch the beginning of another match.空匹配包含在结果中，除非它们触及另一个匹配的开始。

Note that it says if groups are present then a list of the group matches will be returned.请注意，它表示如果存在组，则将返回组匹配列表。 The capturing group you have at the end of your regex is matching and so only the captured part of the groups in each match is returned.您在正则表达式末尾拥有的捕获组是匹配的，因此仅返回每个匹配中捕获的组部分。 This information is simply another field in the MatchObject object when you use finditer.当您使用 finditer 时，此信息只是 MatchObject object 中的另一个字段。

为什么 re.findall() 给我的结果与 Python 中的 re.finditer() 不同？

问题描述

5 个解决方案

解决方案1
7 已采纳 2011-05-27 21:12:41

解决方案2
3 2011-05-27 21:23:20

解决方案3
1 2011-05-27 21:13:47

解决方案4
1 2011-05-27 21:16:43

解决方案5
0 2011-05-27 21:19:49

为什么 re.findall() 给我的结果与 Python 中的 re.finditer() 不同？

问题描述

5 个解决方案

解决方案1 7 已采纳 2011-05-27 21:12:41

解决方案2 3 2011-05-27 21:23:20

解决方案3 1 2011-05-27 21:13:47

解决方案4 1 2011-05-27 21:16:43

解决方案5 0 2011-05-27 21:19:49

解决方案1
7 已采纳 2011-05-27 21:12:41

解决方案2
3 2011-05-27 21:23:20

解决方案3
1 2011-05-27 21:13:47

解决方案4
1 2011-05-27 21:16:43

解决方案5
0 2011-05-27 21:19:49