How to find a particular URL in an HTML file with python?

Question

There is a URL with a .bin attachment in my HTML file.
My goal is to extract the full link with my Python script. I am running this script across many HTML files and the location of the .bin URL may change.
If I was able to get the index of the beginning of the URL and the end, I could extract it that way.

I tried doing a word search through the HTML files but there are a few .bin URLS, I only want the first one. Any ideas would be appreciated. Or any other methods.

import urllib.request, urllib.error, urllib.parse
html_link = "www.mywebsitelink.com"
response = urllib.request.urlopen(html_link)
webContent = response.read()

Answer 1

I suggest you look at using Regex .

In your example, you will probably be looking for something like:

^http://.+\.bin$

You can test this out and explore what each part of the Regex expression means using this helpful tool: regex101

Your code would probably look something like this:

import re

bin_url = re.search("^http://.+\.bin$", webContent)

How to find a particular URL in an HTML file with python?

Question

1 answers

solution1
0 2019-12-27 19:28:07

How to find a particular URL in an HTML file with python?

Question

1 answers

solution1 0 2019-12-27 19:28:07

solution1
0 2019-12-27 19:28:07