简体   繁体   中英

Multiple occurences of same character in a string regexp - Python

Given a string made up of 3 capital letters, 1 small caps and another 3 capital ones, eg AAAaAAA

I can't seem to find a regexp that would find a string which matches a string that has:

  • first 3 capital letters all different
  • any small caps letter
  • first 2 same capital letters as the very first one
  • last capital letter the same as the last capital letter in the first "trio"

eg A B C a AA C (no spaces)


Turns out I needed something slightly different eg ABCaAAC where 'a' is the small caps version of the very fist character, not just any character

The following should work:


For example:

>>> regex = re.compile(r'^([A-Z])(?!.?\1)([A-Z])(?!\2)([A-Z])[a-z]\1\1\3$')
>>> regex.match('ABAaAAA')  # fails: first three are not different
>>> regex.match('ABCaABC')  # fails: first two of second three are not first char
>>> regex.match('ABCaAAB')  # fails: last char is not last of first three
>>> regex.match('ABCaAAC')  # matches!
<_sre.SRE_Match object at 0x7fe09a44a880>


^          # start of string
([A-Z])    # match any uppercase character, place in \1
(?!.?\1)   # fail if either of the next two characters are the previous character
([A-Z])    # match any uppercase character, place in \2
(?!\2)     # fail if next character is same as the previous character
([A-Z])    # match any uppercase character, place in \3
[a-z]      # match any lowercase character
\1         # match capture group 1
\1         # match capture group 1
\3         # match capture group 3
$          # end of string

If you want to pull these matches out from a larger chunk of text, just get rid of the ^ and $ and use regex.search() or regex.findall() .

You may however find the following approach easier to understand, it uses regex for the basic validation but then uses normal string operations to test all of the extra requirements:

def validate(s):
    return (re.match(r'^[A-Z]{3}[a-z][A-Z]{3}$', s) and s[4] == s[0] and 
            s[5] == s[0] and s[-1] == s[2] and len(set(s[:3])) == 3)

>>> validate('ABAaAAA')
>>> validate('ABCaABC')
>>> validate('ABCaAAB')
>>> validate('ABCaAAC')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM