在 Python 中使用 pdfkit 将多个 html 文件转换为 pdf

Question

我尝试使用 pdfkik 将多个 html 文件转换为 pdf。 这是我的代码：

from bs4 import BeautifulSoup
from selenium import webdriver
import pdfkit

driver=webdriver.Chrome()
driver.get('https://www.linkedin.com/in/jaypratappandey/')
time.sleep(40)
soup= BeautifulSoup(driver.page_source, 'lxml')
data=[]
f=open('htmlfile.html', 'w')
top=open('tophtmlfile.html', 'w')

for name in soup.select('.pv-top-card-section__body'):
    top.write("%s" % name)

for item in soup.select('.pv-oc.ember-view'):
    f.write("%s" % item)


pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'jayprofile.pdf')

driver.quit()

此代码给出以下错误：

Traceback (most recent call last):
  File "lkdndata.py", line 23, in <module>
    pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'ankurprofile.pdf')
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/api.py", line 49, in from_file
    return r.to_pdf(output_path)
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/pdfkit.py", line 156, in to_pdf
    raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Error: This version of wkhtmltopdf is build against an unpatched version of QT, and does not support more then one input document.
Exit with code 1, due to unknown error.

Answer 1

我有同样的错误。 您可能遇到的错误是由于qt安装不一致以及兼容的qt版本不可用。 尝试跑步

wkhtmltopdf

在终端上查看是否可以找到“缩减的功能”。

如果是，那么我的假设是正确的，那么您最安全的选择就是从源代码进行编译。

Answer 2

我找到的解决方案是首先将 html 文件合并为一个文件，然后继续使用 pdfkit 进行转换。 所以在你的情况下是将 tophtml 和 html 文件一起保存在同一个目录中并替换该目录的路径。

import pdfkit
import os

# path to folder containing html files
path = "/home/ec2-user/data-science-processes/src/results/"

def multiple_html_to_pdf(path):
    """ converts multiple html files to a single pdf
    args: path to directory containing html files
    """
    empty_html = '<html><head></head><body></body></html>'
    for file in os.listdir(path):
        if file.endswith(".html"):
            print(file)
            # append html files
            with open(path + file, 'r') as f:
                html = f.read()
                empty_html = empty_html.replace('</body></html>', html + '</body></html>')
    # save merged html
    with open('merged.html', 'w') as f:
        f.write(empty_html)
    pdfkit.from_file('/home/ec2-user/data-science-processes/report/merged.html','Report.pdf')

multiple_html_to_pdf(path)

在 Python 中使用 pdfkit 将多个 html 文件转换为 pdf

问题描述

2 个解决方案

解决方案1
0 2017-12-13 14:27:08

解决方案2
0 2021-12-09 09:10:52

在 Python 中使用 pdfkit 将多个 html 文件转换为 pdf

问题描述

2 个解决方案

解决方案1 0 2017-12-13 14:27:08

解决方案2 0 2021-12-09 09:10:52

解决方案1
0 2017-12-13 14:27:08

解决方案2
0 2021-12-09 09:10:52