Python正则表达式中的上一组匹配

Question

I try to capture fragments of string that looks like %a , %b , etc. and replace them with some values. 我尝试捕获看起来像%a ， %b等的字符串片段，并用一些值替换它们。 Additionally, I want to be able to escape % character by typing %% . 此外，我希望能够通过键入%%来转义%字符。

In an example string %d%%f%x%%%g I want to match %d %%f %x %% %g ( %d , %x , %g ). 在示例字符串中%d%%f%x%%%g我想匹配%d %%f %x %% %g （ %d ， %x ， %g ）。

My regular expression looks like this: 我的正则表达式如下所示：

(?:[^%]|^)(?:%%)*(%[a-z])

(?:[^%]|^) - match to the beginning of the line or the character different from % (?:[^%]|^) - 匹配行的开头或与%不同的字符
(?:%%)* - match to 0 or more occurrences of %% (escaped % ) (?:%%)* - 匹配0次或更多次%% （转义% ）
(%[az]) - proper match to %a , %b , etc. patterns (%[az]) - 与%a ， %b等模式正确匹配

First two elements are added to support escaping of % character. 添加前两个元素以支持转义%字符。

However, when running the regexp on example string the last fragment ( %g ) is not found: 但是，在示例字符串上运行regexp时，找不到最后一个片段（ %g ）：

>>> import re
>>> pat = re.compile("(?:[^%]|^)(?:%%)*(%[a-z])")
>>> pat.findall("%d%%f%x%%%g")
['%d', '%x']

but after adding a character before %%%g , it starts to work fine: 但在%%%g之前添加一个字符后，它开始正常工作：

>>> pat.findall("%d%%f%x %%%g")
['%d', '%x', '%g']

It looks like x is not matched again to [^%] after matching to the group (%[az]) . 匹配到组(%[az])后，看起来x再次与[^%]不匹配。 How can I change the regexp to force it to check the last character of previous match again? 如何更改正则表达式以强制它再次检查上一个匹配的最后一个字符？ I read about \\G , but it didn't help. 我读到了\\G ，但它没有帮助。

Answer 1

Why it didn't pick the %g ? 为什么它没有选择%g ？

To pick the %g , it must have to have %% before it. 要选择%g ，它必须具有%% 。 And even before that it must have to have a non-% character, or at the beginning of the string. 甚至在此之前它必须具有non-%字符，或者在字符串的开头。 So, x%%%g could have a match for you. 所以， x%%%g可以与你匹配。 But this x was picked during previous matching(ie when printing %x ). 但是在先前的匹配期间（即在打印%x ）选择了该x 。

In simple, you have overlapping on your regex matching. 简单来说，你的正则表达式匹配重叠。 So you can overcome this using following one. 所以你可以用下面的方法克服这一点。 I am placing your regex inside the (?= ... ) 我把你的正则表达式放在(?= ... )

pat = re.compile("(?=(?:[^%]|^)(?:%%)*(%[a-z]))")

Answer 2

You need to construct your regex a little differently: 你需要以不同的方式构建你的正则表达式：

>>> import re
>>> regex = re.compile(r"(?:[^%]|%%)*(%[a-z])")
>>> regex.findall("%d%%f%x%%%g")
['%d', '%x', '%g']

Explanation: 说明：

(?:      # Start of a non-capturing group:
 [^%]    # Either match any character except %
|        # or
 %%      # match an "escaped" %.
)*       # Do this any number of times.
(        # Match and capture in group 1:
 %[a-z]  # % followed by a lowercase ASCII alphanumeric
)        # End of capturing group

Answer 3

It seems to me that you want to catch only every portion %x that is preceded by an even number of % . 在我看来，要赶上只有每部分%x由偶数的前面% 。

If so, the pattern is "(?<!%)(?:%%)*(%[az])" 如果是，则模式为"(?<!%)(?:%%)*(%[az])"

Python正则表达式中的上一组匹配

问题描述

3 个解决方案

解决方案1
3 已采纳 2014-03-12 18:22:16

解决方案2
2 2014-03-12 18:23:12

解决方案3
2 2014-03-12 18:47:10

Python正则表达式中的上一组匹配

问题描述

3 个解决方案

解决方案1 3 已采纳 2014-03-12 18:22:16

解决方案2 2 2014-03-12 18:23:12

解决方案3 2 2014-03-12 18:47:10

解决方案1
3 已采纳 2014-03-12 18:22:16

解决方案2
2 2014-03-12 18:23:12

解决方案3
2 2014-03-12 18:47:10