简体   繁体   中英

Python Regex trying to find pattern that has two sets of optional characters

I'm trying to use python regex on a pattern that has two sets of optional characters that may or may not be there. Below is what I'm trying to accomplish.

h becomes a when h is preceded by o but can also be preceded by a colon (:)
following the o and then maybe followed by f,y,r (f|y|r)

So this rule would be applied to the following patterns.

o:fh -> o:fa
ofh -> ofa
o:h -> o:a
oh -> oa

Below is what I'm trying.

re.sub(ur"o[(:|)][(f|y|r)]h", "o\1\2a", word);

I'm really struggling with the grouping and the two sets of optional characters : and (f|y|r) that may or may not be there. Any help is greatly appreciated. Thanks!

Regex elements are made optional by following them with ? , not by enclosing them in brackets. The correct way (well, a correct way) to write your expression is:

re.sub(ur"o(:?[fyr]?)h", ur"o\1a", word)

Note that the replacement string has to be raw ( r" " ) so that the \\1 won't be interpreted as character 0x01.

Your syntax is incorrect, you are trying to use capturing groups inside of character classes . In simplest form, it lists the characters that may be matched inside square brackets ( matching any character from the list )

正则表达式可视化

You can simply use one group, following the characters you want to be optional with ?

>>> re.sub(ur'(o:?[yrf]?)h', ur'\1a', word)

Explanation:

(          # group and capture to \1:
  o        #   'o'
  :?       #   ':' (optional)
  [yrf]?   #   any character of: 'y', 'r', 'f' (optional)
)          # end of \1
h          # 'h'

You could use the regex module which supports variable-length lookbehind.

>>> import regex
>>> regex.sub(r'(?<=o:?[yrf]?)h', 'a', word)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM