简体   繁体   English

结合使用Selenium和Python和PhantomJS将文件下载到文件系统

[英]Using Selenium with Python and PhantomJS to download file to filesystem

I've been grappling with using PhantomJS/Selenium/python-selenium to download a file to the filesystem. 我一直在努力使用PhantomJS / Selenium / python-selenium将文件下载到文件系统。 I'm able to easily navigate through the DOM and click, hover etc. Downloading a file is, however, proving to be quite troublesome. 我能够轻松浏览DOM并单击,悬停等。但是,下载文件被证明是相当麻烦的。 I've tried a headless approach with Firefox and pyvirtualdisplay but that wasn't working well either and was unbelievably slow. 我已经尝试过使用Firefox和pyvirtualdisplay的无脑方法,但是这种方法也不能很好地运行,而且速度令人难以置信。 I know That CasperJS allows for file downloads. 我知道CasperJS允许下载文件。 Does anyone know how to integrate CasperJS with Python or how to utilize PhantomJS to download files. 有谁知道如何将CasperJS与Python集成或如何利用PhantomJS下载文件。 Much appreciated. 非常感激。

Despite this question is quite old, downloading files through PhantomJS is still a problem. 尽管这个问题已经很老了,但是通过PhantomJS下载文件仍然是一个问题。 But we can use PhantomJS to get download link and fetch all needed cookies such as csrf tokens and so on. 但是我们可以使用PhantomJS获取下载链接并获取所有需要的cookie,例如csrf令牌等。 And then we can use requests to download it actually: 然后我们可以使用requests来实际下载它:

import requests
from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get('page_with_download_link')
download_link = driver.find_element_by_id('download_link')
session = requests.Session()
cookies = driver.get_cookies()

for cookie in cookies: 
    session.cookies.set(cookie['name'], cookie['value'])
response = session.get(download_link)

And now in response.content actual file content should appear. 现在,在response.content实际文件内容应该出现。 We can next write it with open or do whatever we want. 接下来,我们可以open它来编写它,也可以做我们想做的任何事情。

PhantomJS doesn't currently support file downloads. PhantomJS当前不支持文件下载。 Relevant issues with workarounds: 解决方法的相关问题:

As far as I understand, you have at least 3 options: 据我了解,您至少有3种选择:

  • switch to casperjs (and you should leave python here) 切换到casperjs (您应该在这里保留python)
  • try with headless on xvfb 尝试在xvfb上无头
  • switch to normal non-headless browsers 切换到普通的非无头浏览器

Here are also some links that might help too: 这里还有一些可能也有帮助的链接:

My use case required a form submission to retrieve the file. 我的用例需要提交表单才能检索文件。 I was able to accomplish this using the driver's execute_async_script() function. 我能够使用驱动程序的execute_async_script()函数来完成此操作。

 js = '''
    var callback = arguments[0];
    var theForm = document.forms['theFormId'];
    data = new FormData();
    data.append('eventTarget', "''' + target + '''"); // this is the id of the file clicked
    data.append('otherFormField', theForm.otherFormField.value);

    var xhr = new XMLHttpRequest();
    xhr.open('POST', theForm.action, true);
'''

for cookie in driver.get_cookies():
    js += ' xhr.setRequestHeader("' + cookie['name'] + '", "' + cookie['value'] + '"); '

js += '''
    xhr.onload = function () {
        callback(this.responseText);
    };
    xhr.send(data);
'''

driver.set_script_timeout(30)
file = driver.execute_async_script(js)

Is not posible in that way. 用这种方式是不可能的。 You can use other alternatives to download files like wget o curl. 您可以使用其他替代方法来下载文件,例如wget o curl。

Use firefox to find the right request and selenium to get the values for that and finally use out of to the box to download the file 使用firefox查找正确的请求,使用硒获取该请求的值,最后使用开箱即用的格式下载文件

curlCall=" curl 'http://www_sitex_org/descarga.jsf' -H '...allCurlRequest....' > file.xml"
subprocess.call(curlCall, shell=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM