I'm having trouble understanding regex behaviour when using lookahead.
I have a given string in which I have two overlapping patterns (starting with M
and ending with p
). My expected output would be MGMTPRLGLESLLEp
and MTPRLGLESLLEp
. My python code below results in two empty strings which share a common start with the expected output.
Removal of the lookahead (?=)
results in only ONE output string which is the larger one. Is there a way to modify my regex term to prevent empty strings so that I can get both results with one regex term?
import re
string = 'GYMGMTPRLGLESLLEpApMIRVA'
pattern = re.compile(r'(?=M(.*?)p)')
sequences = pattern.finditer(string)
for results in sequences:
print(results.group())
print(results.start())
print(results.end())
The overlapping matches trick with a look-ahead makes use of the fact that the (?=...)
pattern matches at an empty location, then pulls out the captured group nested inside the look-ahead.
You need to print out group 1, explicitly:
for results in sequences:
print(results.group(1))
This produces:
GMTPRLGLESLLE
TPRLGLESLLE
You probably want to include the M
and p
characters in the capturing group:
pattern = re.compile(r'(?=(M.*?p))')
at which point your output becomes:
MGMTPRLGLESLLEp
MTPRLGLESLLEp
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.