I am learning selenium and web-scraping in Python (V3.6.6, x64 version). I am trying to write a script that will, when executed, automatically download latest available win64
version of geckodriver
(v0.22.0, at the time of posting this question) from the url https://github.com/mozilla/geckodriver/releases
to a specific location on my windows PC.
My problem is, when I look at the page source using Mozilla Firefox browser, the id and the class for the specific version I am trying to download is the same as all the other available versions. I am not able to filter out the specific section and get the href so that the file can be downloaded. I am surely missing something but inspite of several internet searches, I am not able to figure out what I am doing wrong. I request the experts in Stackoverflow to guide/correct me on the next steps. Below are the things I am trying to solve:
1) Download win64 version of latest geckodriver
2) File should be downloaded to C:\\Python
3) How to understand program has downloaded file completely so that it can execute further?
from urllib.request import urlopen, urlretrieve
from bs4 import BeautifulSoup
# Define page where geckodriver can be downloaded
url = "https://github.com/mozilla/geckodriver/releases"
try:
# Query the website and return the html to the variable ‘page’
page = urlopen(url)
except:
# Thow message for any unexpected behaviour when loading page
print("Unable to download geckodriver. Hit any key to exit program.")
user_input = input()
exit()
# Parse the html using beautifulsoup and store in variable `soup`
soup = BeautifulSoup(page, "html.parser")
# Trying to search and filter latest win64 version
result = soup.find_all('a', {'class': 'd-flex flex-items-center'})
first of all, find the latest version and then get the win64 link:
latest = soup.find('div', {'class': 'release-entry'})
results = latest.find_all('a', {'class': 'd-flex flex-items-center'})
for result in results:
if 'geckodriver/releases/download/' in result.get('href) and 'win64.zip' in result.get('href):
print (result.get('href))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.