简体   繁体   中英

Find uppercase letters with one lowercase letter between them

What regular expression do I need to find 3 uppercase letters with 1 lowercase letter between them?

For example, I have: sDdDSADadasAHHdHSAsdsagfGoHHHfHHHH

But I need: AHHdHSA

I'm new to regular expressions but something like [AZ]{3}[az]{1}[AZ]{3} will find also HHHfHHH , but I only need 3 uppercase and the next one need to be lowercase. I need to get AHHdHSA .

You could make use of lookarounds to assert not an uppercase char before and after the 3 uppercase chars.

(?<![A-Z])[A-Z]{3}[a-z][A-Z]{3}(?![A-Z])
  • (?<![AZ]) Negative lookbehind, assert no upppercase char on the left
  • [AZ]{3} Match 3 uppercase chars AZ
  • [az] Match a single lowercase char (note that you can omit the {1} )
  • [AZ]{3} Match 3 uppercase chars
  • (?![AZ]) Negative lookahead, assert no uppercase char on the right

Regex demo

Maybe,

(?<=[^A-Z]|^)[A-Z]{3}[a-z][A-Z]{3}(?=[^A-Z]|$)

would do then.

Demo


I guess to implement this pattern, we might want to have the regex module installed,

$ pip3 install regex

Otherwise, I think, the pattern in this answer would be a better choice, which you can implement it with re module:

import re

string = '''
sDdDSADadasAHHdHSAsdsagfGoHHHfHHHH
AHHdHSA
'''

expression = r'(?<![A-Z])[A-Z]{3}[a-z][A-Z]{3}(?![A-Z])'

print(re.findall(expression, string))

Test

import regex as re

string = '''
sDdDSADadasAHHdHSAsdsagfGoHHHfHHHH
AHHdHSA
'''

expression = r'(?<=[^A-Z]|^)[A-Z]{3}[a-z][A-Z]{3}(?=[^A-Z]|$)'

print(re.findall(expression, string))

Output

['AHHdHSA', 'AHHdHSA']

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com . If you'd like, you can also watch in this link , how it would match against some sample inputs.


RegEx Circuit

jex.im visualizes regular expressions:

在此处输入图像描述

you can use groups to ensure that you grab the pattern while also matching around it:

# to match your pattern with a lowercase letter after
pat1 = re.compile('([A-Z]{3}[a-z]{1}[A-Z]{3})([a-z]+)')

# should yield what you need
pat1.search('sDdDSADadasAHHdHSAsdsagfGoHHHfHHHH').group(1)

# as an explanation for group capture, run this:
mymatch = pat1.search('sDdDSADadasAHHdHSAsdsagfGoHHHfHHHH')
mymatch.group(0)
mymatch.group(1)
mymatch.group(2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM