简体   繁体   中英

Selenium - Downloading file - click() works only sometimes

I'm trying to make a script to download specific PDF files from BlackRock websites (ishares.com or blackrock.com), but the click() function usually doesn't work. Sometimes it does though - once every 3-5 executions or so, it manages to download one file.

(When I used a similar script for all PDFs from those websites it also worked only once in a few executions, and it downloaded always the same files every time it somewhat worked, skipping the rest.)

So, let's say I attempt to download KIID/KID PDF files from those sites:

https://www.ishares.com/uk/individual/en/products/251857/ishares-msci-emerging-markets-ucits-etf-inc-fund?switchLocale=y&siteEntryPassthrough=true
https://www.ishares.com/ch/individual/en/products/251931/ishares-stoxx-europe-600-ucits-etf-de-fund?switchLocale=y&siteEntryPassthrough=true
https://www.blackrock.com/uk/individual/products/251565/ishares-euro-corporate-bond-large-cap-ucits-etf?switchLocale=y&siteEntryPassthrough=true

with this code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from pyvirtualdisplay import Display
import time


def blackrock_getter(url):
    with Display():
        mime_types = "application/pdf,application/vnd.adobe.xfdf,application/vnd.fdf,application/x-pdf,application/vnd.adobe.xdp+xml"
        profile = webdriver.FirefoxProfile()
        profile.set_preference('browser.download.folderList', 2)
        profile.set_preference('browser.download.manager.showWhenStarting', False)
        profile.set_preference('browser.download.dir', '/home/user/kiid_temp')
        profile.set_preference('browser.helperApps.neverAsk.saveToDisk', mime_types)
        profile.set_preference("plugin.disable_full_page_plugin_for_types", mime_types)
        profile.set_preference('pdfjs.disabled', True)
        driver = webdriver.Firefox(firefox_profile=profile)
        driver.get(url)
        try:
            element = WebDriverWait(driver, 20).until(
            EC.element_to_be_clickable((By.XPATH, ("//header[@class='main-header']//a[@class='icon-pdf'][1]"))))
            driver.execute_script("arguments[0].click();", element)
        finally:
            driver.quit()
        time.sleep(3)  # very precise mechanism to wait until the download is complete


def main():
    urls_file = open('urls_list.txt', 'r')  # the URLs I pasted above
    for url in urls_file.readlines():
        if url[-1:] == "\n":
            url = url[:-1]
        if url[0:4] == "http":
            filename = url.split('?')[0]
            filename = filename.split('/')[-1]
            if 'blackrock.com/' in url or 'ishares.com/' in url:
                print(f"Processing {filename}...")
                blackrock_getter(url)


main()

The result is (every once in a while) one file: kiid-ishares-msci-emerging-markets-ucits-etf-dist-gb-ie00b0m63177-en.pdf.

Any ideas how to fix this?

You could try to use the pyautogui module, but you would'n be able to use your computer while your program is running.

Seems to be the script is completing before the file download is completed, I mean downloading is not competing with in 3 seconds. Here is the method that will wait until the PDF download completes.

# method to get the downloaded file name
def getDownLoadedFileName(waitTime):
    driver.execute_script("window.open()")
    # switch to new tab
    driver.switch_to.window(driver.window_handles[-1])
    # navigate to chrome downloads
    driver.get('chrome://downloads')
    # define the endTime
    endTime = time.time()+waitTime
    while True:
        try:
            # get downloaded percentage
            downloadPercentage = driver.execute_script(
                "return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
            # check if downloadPercentage is 100 (otherwise the script will keep waiting)
            if downloadPercentage == 100:
                # return the file name once the download is completed
                return driver.execute_script("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content  #file-link').text")
        except:
            pass
        time.sleep(1)
        if time.time() > endTime:
            break

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM