I'm trying to use python regex on a pattern that has two sets of optional characters that may or may not be there. Below is what I'm trying to accomplish.
h becomes a when h is preceded by o but can also be preceded by a colon (:)
following the o and then maybe followed by f,y,r (f|y|r)
So this rule would be applied to the following patterns.
o:fh -> o:fa
ofh -> ofa
o:h -> o:a
oh -> oa
Below is what I'm trying.
re.sub(ur"o[(:|)][(f|y|r)]h", "o\1\2a", word);
I'm really struggling with the grouping and the two sets of optional characters :
and (f|y|r)
that may or may not be there. Any help is greatly appreciated. Thanks!
Regex elements are made optional by following them with ?
, not by enclosing them in brackets. The correct way (well, a correct way) to write your expression is:
re.sub(ur"o(:?[fyr]?)h", ur"o\1a", word)
Note that the replacement string has to be raw ( r" "
) so that the \\1
won't be interpreted as character 0x01.
Your syntax is incorrect, you are trying to use capturing groups inside of character classes . In simplest form, it lists the characters that may be matched inside square brackets ( matching any character from the list )
You can simply use one group, following the characters you want to be optional with ?
>>> re.sub(ur'(o:?[yrf]?)h', ur'\1a', word)
Explanation:
( # group and capture to \1:
o # 'o'
:? # ':' (optional)
[yrf]? # any character of: 'y', 'r', 'f' (optional)
) # end of \1
h # 'h'
You could use the regex module which supports variable-length lookbehind.
>>> import regex
>>> regex.sub(r'(?<=o:?[yrf]?)h', 'a', word)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.