简体   繁体   English

如何在Selenium(Python)中将打开的页面保存为pdf

[英]how to save opened page as pdf in Selenium (Python)

Have tried all the solutions I could find on the Internet to be able to print a page that is open in Selenium in Python.已经尝试了我可以在 Internet 上找到的所有解决方案,以便能够打印在 Python 中在 Selenium 中打开的页面。 However, while the print pop-up shows up, after a second or two it goes away, with no PDF saved.然而,当打印弹出窗口出现时,一两秒钟后它就会消失,没有保存 PDF。

Here is the code being tried.这是正在尝试的代码。 Based on the code here - https://stackoverflow.com/a/43752129/3973491基于此处的代码 - https://stackoverflow.com/a/43752129/3973491

Coding on a Mac with Mojave 10.14.5.使用 Mojave 10.14.5 在 Mac 上编码。

from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import WebDriverException
import time
import json

options = Options()
appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2
}

profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}
# profile = {'printing.print_preview_sticky_settings.appState':json.dumps(appState),'savefile.default_directory':downloadPath}
options.add_experimental_option('prefs', profile)
options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'

driver = webdriver.Chrome(options=options, executable_path=CHROMEDRIVER_PATH)
driver.implicitly_wait(5)
driver.get(url)
driver.execute_script('window.print();')
$chromedriver --v
ChromeDriver 75.0.3770.90 (a6dcaf7e3ec6f70a194cc25e8149475c6590e025-refs/branch-heads/3770@{#1003})

Any hints or solutions as to what can be done to print the open html page to a PDF.关于如何将打开的 html 页面打印为 PDF 的任何提示或解决方案。 Have spent hours trying to make this work.花了几个小时试图使这项工作。 Thank you!谢谢!


Update on 2019-07-11: 2019-07-11 更新:

My question has been identified as a duplicate, but a) the other question seems to be using javascript code, and b) the answer does not solve the problem being raised in this question - it may be to do with more recent software versions.我的问题已被识别为重复,但是 a) 另一个问题似乎使用了 javascript 代码,并且 b) 答案没有解决这个问题中提出的问题 - 这可能与更新的软件版本有关。 Chrome version being used is Version 75.0.3770.100 (Official Build) (64-bit), and chromedriver is ChromeDriver 75.0.3770.90.使用的 Chrome 版本是 75.0.3770.100(官方版本)(64 位),chromedriver 是 ChromeDriver 75.0.3770.90。 On Mac OS Mojave.在 Mac OS Mojave 上。 Script is running on Python 3.7.3.脚本在 Python 3.7.3 上运行。

Update on 2019-07-11: 2019-07-11 更新:

Changed the code to将代码更改为

from selenium import webdriver
import json

chrome_options = webdriver.ChromeOptions()
settings = {
    "appState": {
        "recentDestinations": [{
            "id": "Save as PDF",
            "origin": "local",
            "account": "",
        }],
        "selectedDestinationId": "Save as PDF",
        "version": 2
    }
}
prefs = {'printing.print_preview_sticky_settings': json.dumps(settings)}
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=CHROMEDRIVER_PATH)
driver.get("https://google.com")
driver.execute_script('window.print();')
driver.quit()

And now, nothing happens.而现在,什么也没有发生。 Chrome launches, loads url, print dialog appears but then nothing seems to happen - nothing in the default printer queue, and no pdf either - I even searched for the PDF files by looking up "Recent Files" on Mac. Chrome 启动,加载 url,出现打印对话框,但随后似乎什么也没发生——默认打印机队列中没有任何内容,也没有 pdf——我什至通过在 Mac 上查找“最近的文件”来搜索 PDF 文件。

The answer here , worked when I did not have any other printer setup in my OS.当我的操作系统中没有任何其他打印机设置时, 这里的答案有效。 But when I had another default printer, this did not work.但是当我有另一台默认打印机时,这不起作用。

I don't understand how, but making small change this way seems to work.我不明白怎么做,但以这种方式进行小的改变似乎有效。

from selenium import webdriver
import json

chrome_options = webdriver.ChromeOptions()
settings = {
       "recentDestinations": [{
            "id": "Save as PDF",
            "origin": "local",
            "account": "",
        }],
        "selectedDestinationId": "Save as PDF",
        "version": 2
    }
prefs = {'printing.print_preview_sticky_settings.appState': json.dumps(settings)}
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=CHROMEDRIVER_PATH)
driver.get("https://google.com")
driver.execute_script('window.print();')
driver.quit()

The solution is not very good, but you can take a screenshot and convert to pdf by Pillow...解决方案不是很好,但是你可以截图并通过Pillow转换为pdf...

from selenium import webdriver
from io import BytesIO
from PIL import Image

driver = webdriver.Chrome(executable_path='path to your driver')
driver.get('your url here')
img = Image.open(BytesIO(driver.find_element_by_tag_name('body').screenshot_as_png))
img.save('filename.pdf', "PDF", quality=100)

You can use the following code to print PDFs in A5 size with background css enabled:您可以使用以下代码在启用背景 css 的情况下以 A5 大小打印 PDF:

import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import json
import time

chrome_options = webdriver.ChromeOptions()

settings = {
    "recentDestinations": [{
        "id": "Save as PDF",
        "origin": "local",
        "account": ""
    }],
    "selectedDestinationId": "Save as PDF",
    "version": 2,
    "isHeaderFooterEnabled": False,
    "mediaSize": {
        "height_microns": 210000,
        "name": "ISO_A5",
        "width_microns": 148000,
        "custom_display_name": "A5"
    },
    "customMargins": {},
    "marginsType": 2,
    "scaling": 175,
    "scalingType": 3,
    "scalingTypePdf": 3,
    "isCssBackgroundEnabled": True
}

mobile_emulation = { "deviceName": "Nexus 5" }
chrome_options.add_experimental_option("mobileEmulation", mobile_emulation)
chrome_options.add_argument('--enable-print-browser')
#chrome_options.add_argument('--headless')

prefs = {
    'printing.print_preview_sticky_settings.appState': json.dumps(settings),
    'savefile.default_directory': '<path>'
}
chrome_options.add_argument('--kiosk-printing')
chrome_options.add_experimental_option('prefs', prefs)

for dirpath, dirnames, filenames in os.walk('<source path>'):
    for fileName in filenames:
        print(fileName)
        driver = webdriver.Chrome("./chromedriver", options=chrome_options)
        driver.get(f'file://{os.path.join(dirpath, fileName)}')
        time.sleep(7)
        driver.execute_script('window.print();')
        driver.close()

Use pdfkit instead: 改用pdfkit

import pdfkit

pdfkit.from_url('http://stackoverflow.com', 'page.pdf')

Here is the solution I use with Windows :这是我在 Windows 上使用的解决方案:

  • First download the ChromeDriver here : http://chromedriver.chromium.org/downloads and install Selenium首先在此处下载 ChromeDriver: http ://chromedriver.chromium.org/downloads 并安装 Selenium

  • Then run this code (based on the accepted answer, slightly modified to work on Windows):然后运行此代码(基于接受的答案,稍作修改以在 Windows 上工作):

     import json from selenium import webdriver chrome_options = webdriver.ChromeOptions() settings = {"recentDestinations": [{"id": "Save as PDF", "origin": "local", "account": ""}], "selectedDestinationId": "Save as PDF", "version": 2} prefs = {'printing.print_preview_sticky_settings.appState': json.dumps(settings)} chrome_options.add_experimental_option('prefs', prefs) chrome_options.add_argument('--kiosk-printing') browser = webdriver.Chrome(r"chromedriver.exe", options=chrome_options) browser.get("https://google.com/") browser.execute_script('window.print();') browser.close()

I would suggest Downloading the page source html which can be done like so in vb.net: Dim Html As String = webdriver.PageSource Not sure how it is done in python but I'm sure it's very similar Once you have done that then you can select the parts of the page you want to save using an html parser or by parsing it manually with string parsing code.我建议下载可以在 vb.net 中完成的页面源 html:Dim Html As String = webdriver.PageSource 不确定它是如何在 python 中完成的,但我确定它非常相似一旦你完成了,那么你可以使用 html 解析器或通过使用字符串解析代码手动解析来选择要保存的页面部分。 Once you have the html for the part you want to save stored in a string then use an html to pdf converter library or program.一旦您将要保存的部分的 html 存储在字符串中,然后使用 html 到 pdf 转换器库或程序。 There are lots of these for programming languages like C# and vb.net.对于 C# 和 vb.net 等编程语言,有很多这些。 I don't know about any for python but I'm sure some exist.我对 python 一无所知,但我确定存在一些。 Just do some research.只是做一些研究。 (some are free and some are expensive) (有些是免费的,有些是昂贵的)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM