简体   繁体   中英

Python regex : Fetch next line after string match

I have been searching this forum for close match of my problem but could not locate suitable solution, so posting the query.

Am using urllib and re modules to extract certain sections of webpage. What is of interest is also the status associated with those sections.

For example, looking at the source of the webpage :

MY-TEXT #1410 finished subtask PREPARE-WORKSPACE #340418: https://cloud6.foo.bar.com/b/job/PREPARE-WORKSPACE/340418

'>SUCCESS

Am using re.compile and re.findall to extract text coming after this pattern " https://cloud6.foo " ; this matches all the text and using this list I have confirmed it is so ; but am loosing out on the status of this particular task because it is in the line immediate after the "https://" line.

How to extract one line after the matched string in the current scenario ?

Code snippet is here :

from urllib import urlopen
import re

webpage = urlopen(urllink).read()
buildPhases = re.compile(r'\<a href=\W{1}https\W{3}(.*)')
phaseLists = re.findall(buildPhases, webpage)
for item in phaseLists:
    print item

To extract a line after matching string you need to add .*\\n in you regex.
For example if we take:

MY-TEXT #1410 finished subtask PREPARE-WORKSPACE #340418: https://cloud6.foo.bar.com/b/job/PREPARE-WORKSPACE/340418

'>SUCCESS

and apply this pattern r'https.*\\n.*\\n.*' the result should be the above string without:

MY-TEXT #1410 finished subtask PREPARE-WORKSPACE #340418:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM