简体   繁体   中英

The file download path setting in python selenium chrome headless does not apply

I am a web developer in Korea. We've recently been using this Python to implement the website crawl feature.

I'm new to Python. We looked for a lot of things for about two days, and we applied them. Current issues include:

  1. Click the Excel download button to display a new window (pop up).
  2. Clicking Download in the new window opens a new tab in the parent window and shuts down all browsers down as soon as the download starts.
  3. Download page is PHP and data is set to Excel via header so that browser automatically recognizes download.
  4. The problem is that the browser has shut down and the download is not complete, nor is the file saved.

I used the following source code.

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

chrome_driver = './browser_driver/chromedriver'

options = webdriver.ChromeOptions()
options.add_argument('--headless')

download_path = r"C:\Users\files"

timeout = 10

driver = webdriver.Chrome(executable_path=chrome_driver, chrome_options=options)
driver.command_executor._commands["send_command"] = (
    "POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior',
          'params': {'behavior': 'allow', 'downloadPath': download_path}}
command_result = driver.execute("send_command", params)
driver.get("site_url")

#download new window
down_xls_btn = driver.find_element_by_id("download")
down_xls_btn.click()

driver.switch_to_window(driver.window_handles[1])

#download start
down_xls_btn = driver.find_element_by_id("download2")
down_xls_btn.click()

The browser itself shuts down as soon as the download is started during testing without headless mode. The headless mode does not download the file itself.

Annotating a DevTools source related to Page.setDownloadBehavior removes the shutdown but does not change the download path.

I am not good at English, so I translated it into a translator. It's too hard because I'm a beginner. Please help me.


I just tested it with the Firefox web browser. Firefox, unlike Chrome, shows a download window in a new form rather than a new tab, which runs an automatic download and closes the window automatically.

There is a problem here. In fact, the download was successful even in headless mode in the Firefox. However, the driver of the previously defined driver.get() was not recognized when the new window was closed.

import os
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
import json

fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir",download_path)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","application/octet-stream, application/vnd.ms-excel")
fp.set_preference("dom.webnotifications.serviceworker.enabled",False)
fp.set_preference("dom.webnotifications.enabled",False)

timeout = 10 
driver = webdriver.Firefox(executable_path=geckodriver, firefox_options=options, firefox_profile=fp)
driver.get(siteurl)

down_btn = driver.find_element_by_xpath('//*[@id="searchform"]/div/div[1]/div[6]/div/a[2]')
    down_btn.click()

#down_btn Click to display a new window
#Automatic download starts in new window and closes window automatically

driver.switch_to_window(driver.window_handles[0])

#window_handles Select the main window and output the table to output an error.
print(driver.title)

Perhaps this is the same problem as the one we asked earlier. Since the download is currently successful in the Firefox, we have written code to define a new driver and proceed with postprocessing.

Has anyone solved this problem?

I came across the same issue and I managed to solve it that way:

After you switch to the other window, you should enable the download again:

  1. Isolate this code into a function
def enable_download_in_headless_chrome(driver, download_path):
    driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
    params = {
        'cmd': 'Page.setDownloadBehavior',
        'params': {'behavior': 'allow', 'downloadPath': download_path}
    }

    driver.execute("send_command", params)
  1. Call it whenever you need to download a file from another window.

Your code will then be:

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

chrome_driver = './browser_driver/chromedriver'

options = webdriver.ChromeOptions()
options.add_argument('--headless')

download_path = r"C:\Users\files"

timeout = 10

driver = webdriver.Chrome(executable_path=chrome_driver, chrome_options=options)
enable_download_in_headless_chrome(driver, download_path)

driver.get("site_url")

#download new window
down_xls_btn = driver.find_element_by_id("download")
down_xls_btn.click()

driver.switch_to_window(driver.window_handles[1])
enable_download_in_headless_chrome(driver, download_path)  # THIS IS THE MISSING AND SUPER IMPORTANT PART

#download start
down_xls_btn = driver.find_element_by_id("download2")
down_xls_btn.click()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM