简体   繁体   中英

Download latest version of file from website to specific location using Python

I am learning selenium and web-scraping in Python (V3.6.6, x64 version). I am trying to write a script that will, when executed, automatically download latest available win64 version of geckodriver (v0.22.0, at the time of posting this question) from the url https://github.com/mozilla/geckodriver/releases to a specific location on my windows PC.

My problem is, when I look at the page source using Mozilla Firefox browser, the id and the class for the specific version I am trying to download is the same as all the other available versions. I am not able to filter out the specific section and get the href so that the file can be downloaded. I am surely missing something but inspite of several internet searches, I am not able to figure out what I am doing wrong. I request the experts in Stackoverflow to guide/correct me on the next steps. Below are the things I am trying to solve:

1) Download win64 version of latest geckodriver

2) File should be downloaded to C:\\Python

3) How to understand program has downloaded file completely so that it can execute further?

from urllib.request import urlopen, urlretrieve
from bs4 import BeautifulSoup

# Define page where geckodriver can be downloaded
url = "https://github.com/mozilla/geckodriver/releases"

try:
    # Query the website and return the html to the variable ‘page’
    page = urlopen(url)
except:
    # Thow message for any unexpected behaviour when loading page
    print("Unable to download geckodriver. Hit any key to exit program.")
    user_input = input()
    exit()

# Parse the html using beautifulsoup and store in variable `soup`
soup = BeautifulSoup(page, "html.parser")

# Trying to search and filter latest win64 version
result = soup.find_all('a', {'class': 'd-flex flex-items-center'})

first of all, find the latest version and then get the win64 link:

latest = soup.find('div', {'class': 'release-entry'})
results = latest.find_all('a', {'class': 'd-flex flex-items-center'})
for result in results:
    if 'geckodriver/releases/download/' in result.get('href) and 'win64.zip' in result.get('href):
        print (result.get('href))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM