简体   繁体   中英

Empty Regex response using finditer and lookahead

I'm having trouble understanding regex behaviour when using lookahead.

I have a given string in which I have two overlapping patterns (starting with M and ending with p ). My expected output would be MGMTPRLGLESLLEp and MTPRLGLESLLEp . My python code below results in two empty strings which share a common start with the expected output.

Removal of the lookahead (?=) results in only ONE output string which is the larger one. Is there a way to modify my regex term to prevent empty strings so that I can get both results with one regex term?

import re

string = 'GYMGMTPRLGLESLLEpApMIRVA'

pattern = re.compile(r'(?=M(.*?)p)')
sequences = pattern.finditer(string)

for results in sequences:
    print(results.group())
    print(results.start())
    print(results.end())

The overlapping matches trick with a look-ahead makes use of the fact that the (?=...) pattern matches at an empty location, then pulls out the captured group nested inside the look-ahead.

You need to print out group 1, explicitly:

for results in sequences:
    print(results.group(1))

This produces:

GMTPRLGLESLLE
TPRLGLESLLE

You probably want to include the M and p characters in the capturing group:

pattern = re.compile(r'(?=(M.*?p))')

at which point your output becomes:

MGMTPRLGLESLLEp
MTPRLGLESLLEp

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM