[英]Findall vs search for overwriting groups in Python
I found topic Capturing group with findall? 我找到了与findall一起捕获组的主题? but unfortunately it is more basic and covers only groups that do not overwrite themselves.
但不幸的是,它更基本,只覆盖了不会覆盖自己的组。
Please let's take a look at the following example: 请让我们看下面的例子:
S = "abcabc" # string used for all the cases below
print re.findall(r"abc", S) # ['abc', 'abc']
General idea: No groups here so I expect findall
to return a list of all matches - please confirm. 总体思路: 此处没有分组,因此我希望
findall
返回所有比赛的列表 -请确认。
In this case: Findall
is looking for abc
, finds it, returns it, then goes on and finds the second one. 在这种情况下:
Findall
正在寻找abc
,找到它,返回它,然后继续并找到第二个。
print re.findall(r"(abc)", S) # ['abc', 'abc']
General idea: Some groups here so I expect findall
to return a list of all groups - please confirm. 总体思路: 这里有一些小组,所以我希望
findall
返回所有小组的名单 -请确认。
In this case: Why two results while there is only one group? 在这种情况下:为什么只有一组时有两个结果? I understand it this way:
我这样理解:
findall
is looking for abc
, findall
正在寻找abc
,
finds it, 找到了
places it in the group memory buffer, 将其放置在组内存缓冲区中
returns it, 返回它,
findall
starts to look for abc
again, and so on... findall
开始再次寻找abc
,依此类推...
Is this reasoning correct? 这个推理正确吗?
print re.findall(r"(abc)+", S) # ['abc']
This looks similar to the above yet returns only one abc
. 看起来与上面类似,但仅返回一个
abc
。 I understand it this way: 我这样理解:
findall
is looking for abc
, findall
正在寻找abc
,
finds it, 找到了
places it in the group memory buffer, 将其放置在组内存缓冲区中
does not return it because the RE itself demands to go on, 不返回,因为可再生能源本身要求继续进行,
finds another abc
, 找到另一个
abc
,
places it in the group memory buffer (overwrites previous abc
), 将其放置在组内存缓冲区中(覆盖以前的
abc
),
string ends so searching ends as well. 字符串结束,因此搜索也结束。
Is this reasoning correct? 这个推理正确吗? I am very specific here so if there is anything wrong (even tiny detail) then please let me know.
我在这里非常具体,所以如果有什么问题(甚至是很小的细节),请告诉我。
Search
scans through a string looking for a single match, so re.search(r"(abc)", S)
and re.search(r"(abc)", S)
rather obviously return only one abc
, then let me get right to: Search
扫描字符串以查找单个匹配项,因此re.search(r"(abc)", S)
和re.search(r"(abc)", S)
很明显只返回一个abc
,然后让我得到权利:
re.search(r"(abc)+", S)
print m.group() # abcabc
print m.groups() # ('abc',)
a) Of course the whole match is abcabc
, but we still have groups here, so can I conclude that groups are irrelevant (despite name) for m.group()
? a)当然,整个匹配项是
abcabc
,但是这里仍然有组,所以我可以得出结论,组与m.group()
无关(尽管名称m.group()
吗? And that is why nothing gets overwritten for this method? 这就是为什么此方法没有任何内容被覆盖?
In fact, this grouping feature of parentheses is completely unnecessary here - in such cases I just want to use parentheses to stress what needs to be taken together when repeating things without creating any regex groups. 实际上,这里的括号分组功能完全没有必要-在这种情况下,我只想使用括号来强调在重复内容而不创建任何正则表达式组时需要将哪些内容放在一起。
b) Can anyone explain a mechanism behind returning abcabc
(in terms of buffers and so on) similarly like I did in bullet 3 ? b)谁能像我在第3条中一样,解释返回
abcabc
(在缓冲区等方面)背后的机制?
At first, let me state some facts: 首先,让我说一些事实:
match.group()
) is the (sub)text that meets the whole pattern defined in a regular expression. match.group()
)是满足正则表达式中定义的整个模式的(子)文本。 Matches can contain zero or more capture groups . match.group(1..n)
) is a part of the match (that can also be equal to the whole match if the whole pattern is enclosed into a capture group) that is matched with a parenthesized pattern part (a part of the pattern enclosed into a pair of unescaped parentheses). match.group(1..n)
)是匹配项的一部分(如果将整个模式包含在捕获组中,则也可以等于整个匹配项),该值与带括号的模式部分(包含在一对未转义括号中的模式的一部分)。 (\\w{3})+
. (\\w{3})+
等量化捕获组捕获的所有值。 In Python, it is possible with PyPi regex module , in .NET, with a CaptureCollection, etc. 1: No groups here so I expect
findall
to return a list of all matches - please confirm.1:此处没有分组,因此我希望
findall
返回所有匹配项的列表-请确认。
re.findall
returns a list of captured submatches. re.findall
返回捕获的子re.findall
列表。 In case of abc
, re.findall
returns a list of matches. abc
,则re.findall
返回匹配项列表。 2: Why two results while there is only one group?
2:为什么只有一组时有两个结果?
re.findall(r"(abc)", S)
finds two matches in abcabc
, and each match has one submatch, or captured substring, so the resulting array has 2 elements ( abc
and abc
). re.findall(r"(abc)", S)
在abcabc
找到两个匹配项,每个匹配项都有一个子匹配项或捕获的子字符串,因此结果数组具有2个元素( abc
和abc
)。 3: Is this reasoning correct?
3:这个推理正确吗?
re.findall(r"(abc)+", S)
is looking for a match in the form abcabcabc
and so on. re.findall(r"(abc)+", S)
正在以abcabcabc
的形式寻找匹配abcabcabc
,依此类推。 It will match it as a whole and will keep the last abc
in the capture group 1 buffer. abc
保留在捕获组1缓冲区中。 So, I think your reasoning is correct. 4: the whole match is
abcabc
, but we still have groups here, so can I conclude that groups are irrelevant (despite name) form.group()
?4:整个匹配是
abcabc
,但是这里仍然有组,所以我可以得出结论,组与m.group()
无关(尽管名称m.group()
?
(\\w{3})+
and the string to abcedf
you will feel the difference as the output for that case will be edf
. (\\w{3})+
,并将字符串更改为abcedf
您会感觉有所不同,因为该情况下的输出将为edf
。 And that is why nothing gets overwritten for this method? 5: Can anyone explain a mechanism behind returning
abcabc
(in terms of buffers and so on) similarly like I did in bullet 3?5:有人像我在项目符号3中一样解释了返回
abcabc
(在缓冲区等方面)背后的机制吗?
The re.search(r"(abc)+", S)
will match abcabc
( match , not capture ) because re.search(r"(abc)+", S)
将匹配abcabc
( match ,not capture ),因为
abcabc
is searched for abc
from left to right. abcabc
中搜索abc
由左到右。 RE finds abc
at the start and tries to find another abc
right from the location after the first c
. abc
,然后尝试从第一个c
之后的位置开始找到另一个abc
。 RE puts the abc
into Capture group buffer 1. abc
放入捕获组缓冲区1。 abc
, rewrites the capture group #1 buffer with it. abc
,并用它重写捕获组#1缓冲区。 Tries to find another abc
. abc
。 abc
is found - return the matched value found : abcabc
. abc
返回找到的匹配值: abcabc
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.