简体   繁体   中英

Python Regex: search for expression between two expression (also regex-ed)

I'd like to get strings from text where the strings are between two given other strings - but both of these two latter strings are partly defined with regex expressions also.

So, from the following lines:

ALPHA101BETAsomething1GAMMA532DELTA
ALPHA231BETAsomething2GAMMA555DELTA
ALPHA341BETAagainsomethingsomethingGAMMA998DELTA

I'd like to get the following:

something1
something2
againsomething

My problem here is that I cannot define the opening and closing expressions so that these are something plus a three-digit-expressions plus again something again.

So far I tried but failed with this:

re.findall("ALPHA(?:\d\.){3}BETA(.*?)GAMMA(?:\d\.){3}DELTA", pagetext)

How could I instruct the parser that a given regex match group is not the desired result but part of the opening/closing strings?

I modified the regex a little bit and now it works for me. You can use re.compile, re.search, and re.group to get the specific substring you were looking for:

import re
REGEX = re.compile(r'ALPHA(\d){3}BETA(.*?)GAMMA(\d){3}DELTA')
# The next part is all about how your pagetext is formatted.
# if you have newlines in the pagetext:
for line in pagetext.split('\n'):
    result = re.search(REGEX, line)
    your_desired_str = result.group(2)

# if you just want to read the text line by line from a file:
with open(yourfile) as infile:
    for line in infile:
        result = re.search(REGEX, line)
        your_desired_str = result.group(2)

This will work for you:-

import re
text ='ALPHA101BETAsomething1GAMMA532DELTA\nALPHA231BETAsomething2GAMMA555DELTA\nALPHA341BETAagainsomethingsomethingGAMMA998DELTA'


for line in text.split('\n'):

    print re.findall(r'ALPHA+\d+BETA(.*?)GAMMA+\d+DELTA',line)[0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM