Python中的正则表达式：（\\ w）+的search（）vs findall（）

Question

I have created a regular expression as: 我创建了一个正则表达式为：

agentRegex = re.compile(r'Agent (\w)+')

And then I performed search() operation as: 然后，我执行search()操作为：

agentRegex.search('Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.').group()

I obtained 'Agent Alice' as output. 我获得了'Agent Alice'作为输出。

But when I performed findall() operation: 但是当我执行findall()操作时：

agentRegex.findall('Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.')

The output was ['e', 'l', 'e', 'b'] . 输出为['e', 'l', 'e', 'b'] 。

Shouldn't the output be ['Alice Agent', 'Agent Carol', 'Agent Eve', 'Agent Bob'] ? 输出不应该是['Alice Agent', 'Agent Carol', 'Agent Eve', 'Agent Bob']吗？

Answer 1

re.findall() by default outputs a list of captured groups, in your case (\\w+) . 默认情况下， re.findall()输出已捕获组的列表，在您的情况下(\\w+) 。

Get rid of the captured group: 摆脱捕获的组：

Agent \w+

Example: 例：

>>> agentRegex = re.compile(r'Agent \w+')

>>> agentRegex.findall('Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.') 
['Agent Alice', 'Agent Carol', 'Agent Eve', 'Agent Bob']

Answer 2

Your regex: 您的正则表达式：

'Agent (\w)+'

It will keep matching and capturing single \\w characters after 'Agent ' and will keep overwriting the matched group with the next match. 它会继续匹配并捕获'Agent '之后'Agent '单个\\w字符，并会在下次匹配时继续覆盖匹配的组。 Thats how you get ['e', 'l', 'e', 'b'] which are the last characters of ['Alice', 'Carol', 'Eve', 'Bob'] 那就是您如何获得['e', 'l', 'e', 'b']后缀，这些字符是['Alice', 'Carol', 'Eve', 'Bob']

You got correct answer in .search().group() because group() defaults to group(0) which contains everything that got matched, but if you do .search().group(1) you will get ['e'] . 您在.search().group()得到正确答案，因为group()默认为group(0) ，其中包含所有已匹配的内容，但是如果您执行.search().group(1) ，则将获得['e'] 。

What you are looking for is capture the Agent as well as next word. 您正在寻找的是捕获代理以及下一个单词。 So yo u can try like heemayl and Dietrich suggested. 因此，您可以像heemayl和Dietrich建议的那样尝试。

Answer 3

You could do this too: 您也可以这样做：

import re
agentRegex = re.compile(r'Agent\s+[^\s]+')
print agentRegex.findall('Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.') 
# ['Agent Alice', 'Agent Carol', 'Agent Eve', 'Agent Bob']

Python中的正则表达式：（\\ w）+的search（）vs findall（）

问题描述

3 个解决方案

解决方案1
1 2017-02-07 05:54:09

解决方案2
1 2017-02-07 06:09:27

解决方案3
0 2017-02-07 06:12:33

Python中的正则表达式：（\\ w）+的​​search（）vs findall（）

问题描述

3 个解决方案

解决方案1 1 2017-02-07 05:54:09

解决方案2 1 2017-02-07 06:09:27

解决方案3 0 2017-02-07 06:12:33

Python中的正则表达式：（\\ w）+的search（）vs findall（）

解决方案1
1 2017-02-07 05:54:09

解决方案2
1 2017-02-07 06:09:27

解决方案3
0 2017-02-07 06:12:33