简体   繁体   English

Selenium 和无头 chrome 通过 python 下载

[英]Selenium and headless chrome downloads via python

So the issue of downloading files via headless chrome with selenium still seems to be a problem as it was asked here with no answer over a month ago.因此,通过无头 chrome 与 selenium 下载文件的问题似乎仍然是一个问题,因为一个多月前在这里问过没有答案。 but I don't understand how they are implementing the js which is in the bug thread.但我不明白他们是如何实现 bug 线程中的 js 的。 Is there an option I can add or a current fix for this?有没有我可以添加的选项或当前的修复程序? The original bug page located here All of my stuff is up to date as of today 10/22/17位于此处的原始错误页面截至今天 10/22/17 我所有的东西都是最新的

In python:在蟒蛇中:

from selenium import webdriver


options = webdriver.ChromeOptions()

prefs = {"download.default_directory": "C:/Stuff", 
         "download.prompt_for_download": False,
         "download.directory_upgrade": True, 
         "plugins.always_open_pdf_externally": True
         }

options.add_experimental_option("prefs", prefs)
options.add_argument('headless')
driver = webdriver.Chrome(r'C:/Users/aaron/chromedriver.exe', chrome_options = options)

# test file to download which doesn't work
driver.get('http://ipv4.download.thinkbroadband.com/5MB.zip')

If the headless option is removed this works no problem.如果删除无头选项,这没有问题。

The actual files I'm attempting to download are PDFs located at .aspx URLs.我尝试下载的实际文件是位于 .aspx URL 的 PDF。 I'm downloading them by doing a .click() and it works great except not with the headless version.我正在通过执行 .click() 下载它们,除了无头版本之外,它效果很好。 The hrefs are javascript do_postback scripts. hrefs 是 javascript do_postback 脚本。

Why don't you locate the anchor href and then use get request to download the file.为什么不找到锚点href,然后使用get请求下载文件。 This way it will work in headless mode and will be much faster.这样它将在无头模式下工作并且速度会快得多。 I have done that in C#.我已经在 C# 中做到了。

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter
    r = requests.get(url, stream=True)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                #f.flush() commented by recommendation from J.F.Sebastian
    return local_filename

I believe now that Chromium supports this feature (as you linked to the bug ticket), it falls to the chromedriver team to add support for the feature.我相信现在 Chromium 支持此功能(如您链接到错误票证),由 chromedriver 团队添加对该功能的支持。 There is an open ticket here , but it does not appear to have a high priority at the moment.有一个开放的车票 在这里,但它不会出现在目前具有较高的优先级。 Please, everyone who needs this feature, go give it a +1!请所有需要此功能的人给它 +1!

For those of you not on the chromium ticket linked above or haven't found a solution.对于那些不在上面链接的铬票上或尚未找到解决方案的人。 This is working for me.这对我有用。 Chrome is updated to v65 and chromedriver/selenium are both up to date as of 4/16/18. Chrome 已更新到 v65,并且 chromedriver/selenium 都是截至 2018 年 4 月 16 日的最新版本。

    prefs = {'download.prompt_for_download': False,
             'download.directory_upgrade': True,
             'safebrowsing.enabled': False,
             'safebrowsing.disable_download_protection': True}

    
    options.add_argument('--headless')
    options.add_experimental_option('prefs', prefs)
    driver = webdriver.Chrome('chromedriver.exe', chrome_options=options)
    driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
    driver.desired_capabilities['browserName'] = 'ur mum'
    params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': r'C:\chickenbutt'}}
    driver.execute("send_command", params)

If you're getting a Failed-file path too long error when downloading make sure that the downloadpath does't have a trailing space or slash\\or backslash.如果下载时遇到 Failed-file path too long 错误,请确保下载路径没有尾随空格或斜杠\\或反斜杠。 The path must also use backslashes only.路径也必须仅使用反斜杠。 I have no idea why.我不知道为什么。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM