简体   繁体   English

在 Python 中使用 pdfkit 将多个 html 文件转换为 pdf

[英]Converting Multiple html file into pdf using pdfkit in Python

I try converting multiple html file into pdf using pdfkik.我尝试使用 pdfkik 将多个 html 文件转换为 pdf。 This is my code:这是我的代码:

from bs4 import BeautifulSoup
from selenium import webdriver
import pdfkit

driver=webdriver.Chrome()
driver.get('https://www.linkedin.com/in/jaypratappandey/')
time.sleep(40)
soup= BeautifulSoup(driver.page_source, 'lxml')
data=[]
f=open('htmlfile.html', 'w')
top=open('tophtmlfile.html', 'w')

for name in soup.select('.pv-top-card-section__body'):
    top.write("%s" % name)

for item in soup.select('.pv-oc.ember-view'):
    f.write("%s" % item)


pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'jayprofile.pdf')

driver.quit()

This code give the following error:此代码给出以下错误:

Traceback (most recent call last):
  File "lkdndata.py", line 23, in <module>
    pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'ankurprofile.pdf')
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/api.py", line 49, in from_file
    return r.to_pdf(output_path)
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/pdfkit.py", line 156, in to_pdf
    raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Error: This version of wkhtmltopdf is build against an unpatched version of QT, and does not support more then one input document.
Exit with code 1, due to unknown error.

I had the same error. 我有同样的错误。 The error you are probably getting is due to the inconsistency of your qt installation and non availability of compatible qt version. 您可能遇到的错误是由于qt安装不一致以及兼容的qt版本不可用。 Try running 尝试跑步

wkhtmltopdf

on your terminal and see whether you can find "Reduced Functionality". 在终端上查看是否可以找到“缩减的功能”。

If yes then my assumption is correct and then your safest bet would be to compile it from source. 如果是,那么我的假设是正确的,那么您最安全的选择就是从源代码进行编译。

The solution i found was to first merge the html files into one and then go on to convert it using pdfkit.我找到的解决方案是首先将 html 文件合并为一个文件,然后继续使用 pdfkit 进行转换。 so in your case would be to save the tophtml and html files together in same dir and replace the path to that dir.所以在你的情况下是将 tophtml 和 html 文件一起保存在同一个目录中并替换该目录的路径。

import pdfkit
import os

# path to folder containing html files
path = "/home/ec2-user/data-science-processes/src/results/"

def multiple_html_to_pdf(path):
    """ converts multiple html files to a single pdf
    args: path to directory containing html files
    """
    empty_html = '<html><head></head><body></body></html>'
    for file in os.listdir(path):
        if file.endswith(".html"):
            print(file)
            # append html files
            with open(path + file, 'r') as f:
                html = f.read()
                empty_html = empty_html.replace('</body></html>', html + '</body></html>')
    # save merged html
    with open('merged.html', 'w') as f:
        f.write(empty_html)
    pdfkit.from_file('/home/ec2-user/data-science-processes/report/merged.html','Report.pdf')

multiple_html_to_pdf(path)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM