Here's what I'm trying to do:
Create a GET request to load the HTML source
Search the source to find a string, if the string is found then extract the whole line into a variable
I've searched everywhere to find out how to do this but people only explain how to extract the whole source or using a dictionary.
For example, using the WWE Page:
Source: view-source: http://network.wwe.com/video/v2525697583?contextType=wwe-show&contextId=wwe_nxt_uk&contentId=300687284&watchlistAltButtonContext=series
I want to extract the line that includes this string
http://thumbs.media.net.wwe.com/wwe/
Code:
def extract(url):
html = requests.get(url)
text = html.text
word = None
for line in text:
if 'http://thumbs.media.net.wwe.com/wwe/' in line:
word = line
return word
When I am carrying out the function the program will return None as first assigned.
NOTE I only need the first match, not every other match into the variable
This should work:
def extract(url):
response = requests.get(url)
searchstr = 'http://thumbs.media.net.wwe.com/wwe/'
for line in response.text.split("\n"):
if searchstr in line:
return line
return None
Or, shorter:
def extract(url, searchstr):
return next((line for line in requests.get(url).text.split("\n") if searchstr in line), None)
print(extract('http://www.url.com', 'http://thumbs.media.net.wwe.com/wwe/'))
Or even better with a regex :
def extract(url, searchstr):
match = re.search(rf"^(.*{searchstr}.*)$", requests.get(url).text, re.MULTILINE)
return match.group(1) if match else None
print(extract('http://www.url.com', 'http://thumbs.media.net.wwe.com/wwe/'))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.