简体   繁体   中英

Selenium and headless chrome downloads via python

So the issue of downloading files via headless chrome with selenium still seems to be a problem as it was asked here with no answer over a month ago. but I don't understand how they are implementing the js which is in the bug thread. Is there an option I can add or a current fix for this? The original bug page located here All of my stuff is up to date as of today 10/22/17

In python:

from selenium import webdriver


options = webdriver.ChromeOptions()

prefs = {"download.default_directory": "C:/Stuff", 
         "download.prompt_for_download": False,
         "download.directory_upgrade": True, 
         "plugins.always_open_pdf_externally": True
         }

options.add_experimental_option("prefs", prefs)
options.add_argument('headless')
driver = webdriver.Chrome(r'C:/Users/aaron/chromedriver.exe', chrome_options = options)

# test file to download which doesn't work
driver.get('http://ipv4.download.thinkbroadband.com/5MB.zip')

If the headless option is removed this works no problem.

The actual files I'm attempting to download are PDFs located at .aspx URLs. I'm downloading them by doing a .click() and it works great except not with the headless version. The hrefs are javascript do_postback scripts.

Why don't you locate the anchor href and then use get request to download the file. This way it will work in headless mode and will be much faster. I have done that in C#.

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter
    r = requests.get(url, stream=True)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                #f.flush() commented by recommendation from J.F.Sebastian
    return local_filename

I believe now that Chromium supports this feature (as you linked to the bug ticket), it falls to the chromedriver team to add support for the feature. There is an open ticket here , but it does not appear to have a high priority at the moment. Please, everyone who needs this feature, go give it a +1!

For those of you not on the chromium ticket linked above or haven't found a solution. This is working for me. Chrome is updated to v65 and chromedriver/selenium are both up to date as of 4/16/18.

    prefs = {'download.prompt_for_download': False,
             'download.directory_upgrade': True,
             'safebrowsing.enabled': False,
             'safebrowsing.disable_download_protection': True}

    
    options.add_argument('--headless')
    options.add_experimental_option('prefs', prefs)
    driver = webdriver.Chrome('chromedriver.exe', chrome_options=options)
    driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
    driver.desired_capabilities['browserName'] = 'ur mum'
    params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': r'C:\chickenbutt'}}
    driver.execute("send_command", params)

If you're getting a Failed-file path too long error when downloading make sure that the downloadpath does't have a trailing space or slash\\or backslash. The path must also use backslashes only. I have no idea why.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM