[英]Regular Expression in Python: search() vs findall() for (\w)+
I have created a regular expression as: 我创建了一个正则表达式为:
agentRegex = re.compile(r'Agent (\w)+')
And then I performed search()
operation as: 然后,我执行search()
操作为:
agentRegex.search('Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.').group()
I obtained 'Agent Alice'
as output. 我获得了'Agent Alice'
作为输出。
But when I performed findall()
operation: 但是当我执行findall()
操作时:
agentRegex.findall('Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.')
The output was ['e', 'l', 'e', 'b']
. 输出为['e', 'l', 'e', 'b']
。
Shouldn't the output be ['Alice Agent', 'Agent Carol', 'Agent Eve', 'Agent Bob']
? 输出不应该是['Alice Agent', 'Agent Carol', 'Agent Eve', 'Agent Bob']
吗?
re.findall()
by default outputs a list of captured groups, in your case (\\w+)
. 默认情况下, re.findall()
输出已捕获组的列表,在您的情况下(\\w+)
。
Get rid of the captured group: 摆脱捕获的组:
Agent \w+
Example: 例:
>>> agentRegex = re.compile(r'Agent \w+')
>>> agentRegex.findall('Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.')
['Agent Alice', 'Agent Carol', 'Agent Eve', 'Agent Bob']
Your regex: 您的正则表达式:
'Agent (\w)+'
It will keep matching and capturing single \\w
characters after 'Agent '
and will keep overwriting the matched group with the next match. 它会继续匹配并捕获'Agent '
之后'Agent '
单个\\w
字符,并会在下次匹配时继续覆盖匹配的组。 Thats how you get ['e', 'l', 'e', 'b']
which are the last characters of ['Alice', 'Carol', 'Eve', 'Bob']
那就是您如何获得['e', 'l', 'e', 'b']
后缀,这些字符是['Alice', 'Carol', 'Eve', 'Bob']
You got correct answer in .search().group()
because group()
defaults to group(0)
which contains everything that got matched, but if you do .search().group(1)
you will get ['e']
. 您在.search().group()
得到正确答案,因为group()
默认为group(0)
,其中包含所有已匹配的内容,但是如果您执行.search().group(1)
,则将获得['e']
。
What you are looking for is capture the Agent as well as next word. 您正在寻找的是捕获代理以及下一个单词。 So yo u can try like heemayl and Dietrich suggested. 因此,您可以像heemayl和Dietrich建议的那样尝试。
You could do this too: 您也可以这样做:
import re
agentRegex = re.compile(r'Agent\s+[^\s]+')
print agentRegex.findall('Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.')
# ['Agent Alice', 'Agent Carol', 'Agent Eve', 'Agent Bob']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.