Python splitting text between every instance of two specific strings (Regex)

Question

I'm trying to do some kind of web scraping with python and I'm having some trouble. I have a big mass of scraped text and I'm trying to generate a list that contains every instance between two specific strings.

A bunch of lines contain something in the format of "href= /profile/pc/WORD/matches" and I want to create a list of all the WORDs (Every word between an instance of " /profile/pc/" and "/matches").

I tried starting with something like this but I'm not even getting any output. Any help on where to go from here?

import re
url="http:examplewebsite.com"  
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req).read()   
webpage = web_byte.decode('utf-8')  
q = webpage.replace('"','_')    #Replace quotation marks with underscores
print (re.split(r'href=_/profile/pc/', q))

PS Previously I did something like this but I was only getting the first result.

 substring1 = '<a href=_/profile/pc/'   #Starting string before name
 substring2 = '/matches_>'   #Ending string after name
 my_string = q[(q.index(substring1)+len(substring1)):q.index(substring2)]

Answer 1

You have many lines in webpage and you need to move through them all to try to match on them.

import re
url="http:examplewebsite.com"  
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req).read()   
webpage = web_byte.decode('utf-8')  
q = webpage.replace('"','_')    #Replace quotation marks with underscores
for row in q:
    print (re.split(r'href=_/profile/pc/', row))

Python splitting text between every instance of two specific strings (Regex)

Question

1 answers

solution1
0 2018-03-21 01:02:16

Python splitting text between every instance of two specific strings (Regex)

Question

1 answers

solution1 0 2018-03-21 01:02:16

solution1
0 2018-03-21 01:02:16