简体   繁体   中英

How to download a file opened in a new tab with Selenium

I want to download the files on https://www.osc.ca/en/securities-law/osc-bulletin?keyword=61-101&date%5Bmin%5D=&date%5Bmax%5D=&sort_bef_combine=field_start_date_DESC searching the keyword '61-101'. Here is my code

service = Service(r"C:\Users\Lenovo\Desktop\chromedriver.exe")
driver = webdriver.Chrome(service=service)

driver.get('https://www.osc.ca/en/securities-law/osc-bulletin')

search = driver.find_element(By.XPATH, '//*[@id="edit-keyword"]')

search_word = '61-101'
search.send_keys(search_word)
search.send_keys(Keys.ENTER)

for i in range(1, 21):
    sleep(2)
    issue_path = '//*[@id="block-osc-glider-content"]/article/section[3]/div[2]/section[3]/div/div/div/div/div[2]/div/div[3]/div/div[2]/div[' + str(i) +   ']/div/div[1]/div[2]/span[1]/a'
    issue = driver.find_element(By.XPATH, issue_path)
    issue.send_keys(Keys.ENTER)   
        
    driver.switch_to.window(driver.window_handles[1])
    
    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="icon"]/iron-icon//svg')))

    download = driver.find_element(By.XPATH, '//*[@id="icon"]/iron-icon//svg')
    download.send_keys(Keys.ENTER)
    driver.switch_to.window(driver.window_handles[0]) 

However, this gives TimeoutException, and I try to give different XPATH's for the download button and it still couldn't find the download element. I guess the problem might stem from the fact that the driver cannot switch to a new tab.

options = Options()
download_dir = os.getcwd()
prefs = {
   "download.default_directory": download_dir,
   "download.prompt_for_download": False,
   "download.directory_upgrade": True,
   "plugins.always_open_pdf_externally": True
}
options.add_experimental_option("prefs", prefs)
service = Service(r"C:\Users\Lenovo\Desktop\chromedriver.exe")
driver = webdriver.Chrome(service=service,options=options)
wait = WebDriverWait(driver, 20)
driver.get('https://www.osc.ca/en/securities-law/osc-bulletin')

search = driver.find_element(By.XPATH, '//*[@id="edit-keyword"]')

search_word = '61-101'
search.send_keys(search_word)
search.send_keys(Keys.ENTER)

for i in range(1, 21):
    sleep(2)
    issue_path = '//*[@id="block-osc-glider-content"]/article/section[3]/div[2]/section[3]/div/div/div/div/div[2]/div/div[3]/div/div[2]/div[' + str(i) +   ']/div/div[1]/div[2]/span[1]/a'
    issue = driver.find_element(By.XPATH, issue_path)
    issue.send_keys(Keys.ENTER)   
    wait.until(EC.number_of_windows_to_be(2))
    driver.switch_to.window(driver.window_handles[1])
    driver.close()
    driver.switch_to.window(driver.window_handles[0])

To download all the pdfs you need to use pref in the options to download them automatically I made it do the current directory where your file is but you can switch download_dir to any folder path you want.

I'd also suggest some waits for waiting till the handles length is greater than 1.

Imports:

from selenium.webdriver.chrome.options import Options
import os
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM