简体   繁体   中英

Python regex - matching character sequences using prior matched characters

I wish to match strings such as "zxxz" and "vbbv" where a character is followed by a pair of identical characters that do not match the first, then followed by the first. Therefore I do not wish to match strings such as "zzzz" and "vvvv".

I started with the following Python regex that matches all of those examples:

(.)(.)\2\1

In an attempt to exclude the second set ("zzzz", "vvvv"), I tried this modification:

(.)([^\1])\2\1

My reasoning is that the second group can contain any single character provided it is not the same at that matched in the first set.

Unfortunately this does not seem to work as it still matches "zzzz" and "vvvv".

According to the Python 2.7.12 documentation:

\\number

Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) \\1 matches 'the the' or '55 55', but not 'thethe' (note the space after the group). This special sequence can only be used to match one of the first 99 groups. If the first digit of number is 0, or number is 3 octal digits long, it will not be interpreted as a group match, but as the character with octal value number. Inside the '[' and ']' of a character class, all numeric escapes are treated as characters.

(My emphasis added).

I find this sentence ambiguous, or at least unclear, because it suggests to me that the numeric escape should resolve as a single excluded character in the set, but this does not seem to happen.

Additionally, the following regex does not seem to work as I would expect either:

(.)[^\1][^\1][\1]

This doesn't seem to match "zzzz" or "zxxz".

You want to do a negative lookahead assertion (?!...) on \\1 in the second capture group, then it will work:

r'(.)((?!\1).)\2\1'

Testing your examples:

>>> import re
>>> re.match(r'(.)((?!\1).)\2\1', 'zxxz')
<_sre.SRE_Match object at 0x109b661c8>
>>> re.match(r'(.)((?!\1).)\2\1', 'vbbv')
<_sre.SRE_Match object at 0x109b663e8>
>>> re.match(r'(.)((?!\1).)\2\1', 'zzzz') is None
True
>>> re.match(r'(.)((?!\1).)\2\1', 'vvvv') is None
True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM