繁体   English   中英

如何使用硒从网页下载嵌入的 PDF?

[英]How to download embedded PDF from webpage using selenium?

我想像这张图片一样使用 selenium 从网页下载嵌入的 PDF。 嵌入的 PDF 图像

例如,这样的页面: https : //www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism -medico-and-pharmacy-ltd-_43323.html

我尝试了下面提到的代码,但没有成功。

def download_pdf(lnk):

    from selenium import webdriver
    from time import sleep

    options = webdriver.ChromeOptions()

    download_folder = "/*My folder*/"    

    profile = {"plugins.plugins_list": [{"enabled": False,
                                         "name": "Chrome PDF Viewer"}],
               "download.default_directory": download_folder,
               "download.extensions_to_open": ""}

    options.add_experimental_option("prefs", profile)

    print("Downloading file from link: {}".format(lnk))

    driver = webdriver.Chrome('/*Path of chromedriver*/',chrome_options = options)
    driver.get(lnk)
    imp_by1 = driver.find_element_by_id("secondaryToolbarToggle")
    imp_by1.click()
    imp_by = driver.find_element_by_id("secondaryDownload")
    imp_by.click()

    print("Status: Download Complete.")

    driver.close()

download_pdf('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')

任何帮助表示赞赏。

提前致谢!!

给你,代码中的描述:

=^..^=

from selenium import webdriver
import os

# initialise browser
browser = webdriver.Chrome(os.getcwd()+'/chromedriver')
# load page with iframe
browser.get('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')

# find pdf url
pdf_url = browser.find_element_by_tag_name('iframe').get_attribute("src")
# load page with pdf
browser.get(pdf_url)
# download file
download = browser.find_element_by_xpath('//*[@id="download"]')
download.click()

这是另一种无需单击/下载即可获取文件的方法。 如果您的测试在 Selenium Grid(远程节点)中执行,此方法还可以帮助您将文件下载到本地计算机。

import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;

import org.openqa.selenium.Cookie;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;

public class FileDownloader extends MyPage(){
        public void downloadFile(){

          //grab the file download url from your download icon/button/element
         String src = iframe.getAttribute("src");

         driver.get(src); //driver object from 'MyPage.java'

          // Grab cookies from current driver session (authenticated cookie information 
          // is vital to download the file from 'src'
         StringBuilder cookies = new StringBuilder();
         for (Cookie cookie : driver.manage().getCookies()){
          String value = cookie.getName() + "=" + cookie.getValue();
          if (cookies.length() == 0 )
            cookies.append(value);
          else
            cookies.append(";").append(value);
         }


         try{
           HttpURLConnection con = (HttpURLConnection) new URL(src).openConnection();
           con.setRequestMethod("GET");
           con.addRequestProperty("Cookie",cookies.toString());

           //set your own download path, probably a dynamic file name with timestamp
           String downloadPath = System.getProperty("user.dir") + File.separator + "file.pdf";
           OutputStream outputStream = new FileOutputStream(new File(downloadPath));
           InputStream inputStream = con.getInputStream();

           int BUFFER_SIZE = 4096;

           byte[] buffer = new byte[BUFFER_SIZE];
           int bytesRead = -1;

           while((bytesRead = inputStream.read(buffer)) != -1)
              outputStream.write(buffer, 0, bytesRead);

           outputStream.close();
          }catch(Exception e){
            // file download failed.
          }

        }
}

这是我的 dom 的样子

<iframe src="/files/downloads/pdfgenerator.aspx" id="frame01">
  #document
  <html>
    <body>
      <embed width="100%" height ="100%" src="about:blank" type="application/pdf" internalid="1234567890">
    </body>
  </html>
</iframe>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM