简体   繁体   English

python selenium chrome headless中的文件下载路径设置不适用

[英]The file download path setting in python selenium chrome headless does not apply

I am a web developer in Korea. 我是韩国的Web开发人员。 We've recently been using this Python to implement the website crawl feature. 我们最近一直在使用此Python来实现网站抓取功能。

I'm new to Python. 我是Python的新手。 We looked for a lot of things for about two days, and we applied them. 我们在两天内寻找了很多东西,然后应用了它们。 Current issues include: 当前的问题包括:

  1. Click the Excel download button to display a new window (pop up). 单击Excel下载按钮以显示一个新窗口(弹出)。
  2. Clicking Download in the new window opens a new tab in the parent window and shuts down all browsers down as soon as the download starts. 单击新窗口中的下载,将在父窗口中打开一个新选项卡,并在下载开始后立即关闭所有浏览器。
  3. Download page is PHP and data is set to Excel via header so that browser automatically recognizes download. 下载页面是PHP,数据通过标题设置为Excel,以便浏览器自动识别下载。
  4. The problem is that the browser has shut down and the download is not complete, nor is the file saved. 问题是浏览器已关闭,下载未完成,文件也未保存。

I used the following source code. 我使用了以下源代码。

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

chrome_driver = './browser_driver/chromedriver'

options = webdriver.ChromeOptions()
options.add_argument('--headless')

download_path = r"C:\Users\files"

timeout = 10

driver = webdriver.Chrome(executable_path=chrome_driver, chrome_options=options)
driver.command_executor._commands["send_command"] = (
    "POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior',
          'params': {'behavior': 'allow', 'downloadPath': download_path}}
command_result = driver.execute("send_command", params)
driver.get("site_url")

#download new window
down_xls_btn = driver.find_element_by_id("download")
down_xls_btn.click()

driver.switch_to_window(driver.window_handles[1])

#download start
down_xls_btn = driver.find_element_by_id("download2")
down_xls_btn.click()

The browser itself shuts down as soon as the download is started during testing without headless mode. 在没有无头模式的测试过程中,一旦开始下载,浏览器本身就会关闭。 The headless mode does not download the file itself. 无头模式不会下载文件本身。

Annotating a DevTools source related to Page.setDownloadBehavior removes the shutdown but does not change the download path. 注释涉及到DevTools源Page.setDownloadBehavior删除关闭,但不会更改下载路径。

I am not good at English, so I translated it into a translator. 我不会英语,所以我将其翻译成翻译。 It's too hard because I'm a beginner. 太难了,因为我是一个初学者。 Please help me. 请帮我。


I just tested it with the Firefox web browser. 我刚刚使用Firefox Web浏览器对其进行了测试。 Firefox, unlike Chrome, shows a download window in a new form rather than a new tab, which runs an automatic download and closes the window automatically. Firefox与Chrome不同,它以新的形式而不是新的选项卡显示下载窗口,该选项卡运行自动下载并自动关闭窗口。

There is a problem here. 这里有个问题。 In fact, the download was successful even in headless mode in the Firefox. 实际上,即使在Firefox中以无头模式下载也很成功。 However, the driver of the previously defined driver.get() was not recognized when the new window was closed. 但是,关闭新窗口时,无法识别先前定义的driver.get()的驱动程序。

import os
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
import json

fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir",download_path)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","application/octet-stream, application/vnd.ms-excel")
fp.set_preference("dom.webnotifications.serviceworker.enabled",False)
fp.set_preference("dom.webnotifications.enabled",False)

timeout = 10 
driver = webdriver.Firefox(executable_path=geckodriver, firefox_options=options, firefox_profile=fp)
driver.get(siteurl)

down_btn = driver.find_element_by_xpath('//*[@id="searchform"]/div/div[1]/div[6]/div/a[2]')
    down_btn.click()

#down_btn Click to display a new window
#Automatic download starts in new window and closes window automatically

driver.switch_to_window(driver.window_handles[0])

#window_handles Select the main window and output the table to output an error.
print(driver.title)

Perhaps this is the same problem as the one we asked earlier. 也许这与我们之前问过的问题相同。 Since the download is currently successful in the Firefox, we have written code to define a new driver and proceed with postprocessing. 由于目前在Firefox中下载成功,因此我们编写了代码来定义新的驱动程序并进行后处理。

Has anyone solved this problem? 有谁解决了这个问题?

I came across the same issue and I managed to solve it that way: 我遇到了同样的问题,并且设法解决了这一问题:

After you switch to the other window, you should enable the download again: 切换到另一个窗口后,应再次启用下载:

  1. Isolate this code into a function 将此代码隔离为一个函数
def enable_download_in_headless_chrome(driver, download_path):
    driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
    params = {
        'cmd': 'Page.setDownloadBehavior',
        'params': {'behavior': 'allow', 'downloadPath': download_path}
    }

    driver.execute("send_command", params)
  1. Call it whenever you need to download a file from another window. 每当您需要从另一个窗口下载文件时,都可以调用它。

Your code will then be: 您的代码将是:

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

chrome_driver = './browser_driver/chromedriver'

options = webdriver.ChromeOptions()
options.add_argument('--headless')

download_path = r"C:\Users\files"

timeout = 10

driver = webdriver.Chrome(executable_path=chrome_driver, chrome_options=options)
enable_download_in_headless_chrome(driver, download_path)

driver.get("site_url")

#download new window
down_xls_btn = driver.find_element_by_id("download")
down_xls_btn.click()

driver.switch_to_window(driver.window_handles[1])
enable_download_in_headless_chrome(driver, download_path)  # THIS IS THE MISSING AND SUPER IMPORTANT PART

#download start
down_xls_btn = driver.find_element_by_id("download2")
down_xls_btn.click()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM