[英]Previous group match in Python regex
I try to capture fragments of string that looks like %a
, %b
, etc. and replace them with some values. 我尝试捕获看起来像%a
, %b
等的字符串片段,并用一些值替换它们。 Additionally, I want to be able to escape %
character by typing %%
. 此外,我希望能够通过键入%%
来转义%
字符。
In an example string %d%%f%x%%%g
I want to match %d %%f %x %% %g
( %d
, %x
, %g
). 在示例字符串中%d%%f%x%%%g
我想匹配%d %%f %x %% %g
( %d
, %x
, %g
)。
My regular expression looks like this: 我的正则表达式如下所示:
(?:[^%]|^)(?:%%)*(%[a-z])
(?:[^%]|^)
- match to the beginning of the line or the character different from %
(?:[^%]|^)
- 匹配行的开头或与%
不同的字符 (?:%%)*
- match to 0 or more occurrences of %%
(escaped %
) (?:%%)*
- 匹配0次或更多次%%
(转义%
) (%[az])
- proper match to %a
, %b
, etc. patterns (%[az])
- 与%a
, %b
等模式正确匹配 First two elements are added to support escaping of %
character. 添加前两个元素以支持转义%
字符。
However, when running the regexp on example string the last fragment ( %g
) is not found: 但是,在示例字符串上运行regexp时,找不到最后一个片段( %g
):
>>> import re
>>> pat = re.compile("(?:[^%]|^)(?:%%)*(%[a-z])")
>>> pat.findall("%d%%f%x%%%g")
['%d', '%x']
but after adding a character before %%%g
, it starts to work fine: 但在%%%g
之前添加一个字符后,它开始正常工作:
>>> pat.findall("%d%%f%x %%%g")
['%d', '%x', '%g']
It looks like x
is not matched again to [^%]
after matching to the group (%[az])
. 匹配到组(%[az])
后,看起来x
再次与[^%]
不匹配。 How can I change the regexp to force it to check the last character of previous match again? 如何更改正则表达式以强制它再次检查上一个匹配的最后一个字符? I read about \\G
, but it didn't help. 我读到了\\G
,但它没有帮助。
Why it didn't pick the %g
? 为什么它没有选择%g
?
To pick the %g
, it must have to have %%
before it. 要选择%g
,它必须具有%%
。 And even before that it must have to have a non-%
character, or at the beginning of the string. 甚至在此之前它必须具有non-%
字符,或者在字符串的开头。 So, x%%%g
could have a match for you. 所以, x%%%g
可以与你匹配。 But this x
was picked during previous matching(ie when printing %x
). 但是在先前的匹配期间(即在打印%x
)选择了该x
。
In simple, you have overlapping on your regex matching. 简单来说,你的正则表达式匹配重叠。 So you can overcome this using following one. 所以你可以用下面的方法克服这一点。 I am placing your regex inside the (?= ... )
我把你的正则表达式放在(?= ... )
pat = re.compile("(?=(?:[^%]|^)(?:%%)*(%[a-z]))")
You need to construct your regex a little differently: 你需要以不同的方式构建你的正则表达式:
>>> import re
>>> regex = re.compile(r"(?:[^%]|%%)*(%[a-z])")
>>> regex.findall("%d%%f%x%%%g")
['%d', '%x', '%g']
Explanation: 说明:
(?: # Start of a non-capturing group:
[^%] # Either match any character except %
| # or
%% # match an "escaped" %.
)* # Do this any number of times.
( # Match and capture in group 1:
%[a-z] # % followed by a lowercase ASCII alphanumeric
) # End of capturing group
It seems to me that you want to catch only every portion %x
that is preceded by an even number of %
. 在我看来,要赶上只有每部分%x
由偶数的前面%
。
If so, the pattern is "(?<!%)(?:%%)*(%[az])"
如果是,则模式为"(?<!%)(?:%%)*(%[az])"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.