What regular expression do I need to find 3 uppercase letters with 1 lowercase letter between them?
For example, I have: sDdDSADadasAHHdHSAsdsagfGoHHHfHHHH
But I need: AHHdHSA
I'm new to regular expressions but something like [AZ]{3}[az]{1}[AZ]{3}
will find also HHHfHHH
, but I only need 3 uppercase and the next one need to be lowercase. I need to get AHHdHSA
.
You could make use of lookarounds to assert not an uppercase char before and after the 3 uppercase chars.
(?<![A-Z])[A-Z]{3}[a-z][A-Z]{3}(?![A-Z])
(?<![AZ])
Negative lookbehind, assert no upppercase char on the left [AZ]{3}
Match 3 uppercase chars AZ [az]
Match a single lowercase char (note that you can omit the {1}
) [AZ]{3}
Match 3 uppercase chars (?![AZ])
Negative lookahead, assert no uppercase char on the right Maybe,
(?<=[^A-Z]|^)[A-Z]{3}[a-z][A-Z]{3}(?=[^A-Z]|$)
would do then.
I guess to implement this pattern, we might want to have the regex
module installed,
$ pip3 install regex
Otherwise, I think, the pattern in this answer would be a better choice, which you can implement it with re
module:
import re
string = '''
sDdDSADadasAHHdHSAsdsagfGoHHHfHHHH
AHHdHSA
'''
expression = r'(?<![A-Z])[A-Z]{3}[a-z][A-Z]{3}(?![A-Z])'
print(re.findall(expression, string))
import regex as re
string = '''
sDdDSADadasAHHdHSAsdsagfGoHHHfHHHH
AHHdHSA
'''
expression = r'(?<=[^A-Z]|^)[A-Z]{3}[a-z][A-Z]{3}(?=[^A-Z]|$)'
print(re.findall(expression, string))
['AHHdHSA', 'AHHdHSA']
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com . If you'd like, you can also watch in this link , how it would match against some sample inputs.
jex.im visualizes regular expressions:
you can use groups to ensure that you grab the pattern while also matching around it:
# to match your pattern with a lowercase letter after
pat1 = re.compile('([A-Z]{3}[a-z]{1}[A-Z]{3})([a-z]+)')
# should yield what you need
pat1.search('sDdDSADadasAHHdHSAsdsagfGoHHHfHHHH').group(1)
# as an explanation for group capture, run this:
mymatch = pat1.search('sDdDSADadasAHHdHSAsdsagfGoHHHfHHHH')
mymatch.group(0)
mymatch.group(1)
mymatch.group(2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.