I try to capture fragments of string that looks like %a
, %b
, etc. and replace them with some values. Additionally, I want to be able to escape %
character by typing %%
.
In an example string %d%%f%x%%%g
I want to match %%f %%
( %d
, %x
, %g
). %%f %% ( %d
, %x
, %g
)。
My regular expression looks like this:
(?:[^%]|^)(?:%%)*(%[a-z])
(?:[^%]|^)
- match to the beginning of the line or the character different from %
(?:%%)*
- match to 0 or more occurrences of %%
(escaped %
) (%[az])
- proper match to %a
, %b
, etc. patterns First two elements are added to support escaping of %
character.
However, when running the regexp on example string the last fragment ( %g
) is not found:
>>> import re
>>> pat = re.compile("(?:[^%]|^)(?:%%)*(%[a-z])")
>>> pat.findall("%d%%f%x%%%g")
['%d', '%x']
but after adding a character before %%%g
, it starts to work fine:
>>> pat.findall("%d%%f%x %%%g")
['%d', '%x', '%g']
It looks like x
is not matched again to [^%]
after matching to the group (%[az])
. How can I change the regexp to force it to check the last character of previous match again? I read about \\G
, but it didn't help.
Why it didn't pick the %g
?
To pick the %g
, it must have to have %%
before it. And even before that it must have to have a non-%
character, or at the beginning of the string. So, x%%%g
could have a match for you. But this x
was picked during previous matching(ie when printing %x
).
In simple, you have overlapping on your regex matching. So you can overcome this using following one. I am placing your regex inside the (?= ... )
pat = re.compile("(?=(?:[^%]|^)(?:%%)*(%[a-z]))")
You need to construct your regex a little differently:
>>> import re
>>> regex = re.compile(r"(?:[^%]|%%)*(%[a-z])")
>>> regex.findall("%d%%f%x%%%g")
['%d', '%x', '%g']
Explanation:
(?: # Start of a non-capturing group:
[^%] # Either match any character except %
| # or
%% # match an "escaped" %.
)* # Do this any number of times.
( # Match and capture in group 1:
%[a-z] # % followed by a lowercase ASCII alphanumeric
) # End of capturing group
It seems to me that you want to catch only every portion %x
that is preceded by an even number of %
.
If so, the pattern is "(?<!%)(?:%%)*(%[az])"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.