简体   繁体   English

在Java上从第三方HTML生成PDF

[英]Generating PDF from a third-party HTML on java

I'm trying to generate a PDF version of a third-party HTML (actually it is an HTM file). 我正在尝试生成第三方HTML的PDF版本(实际上是HTM文件)。 This HTML may change in future and I have absolutely no control over it. 此HTML将来可能会更改,我对此绝对无法控制。 All I wanna do is convert it to a PDF. 我要做的就是将其转换为PDF。

I already tried 2 solutions: iText (with XmlWorker) and Flying-Saucer, but no success so far. 我已经尝试了2种解决方案:iText(使用XmlWorker)和Flying-Saucer,但到目前为止没有成功。

My problem is that the HTML file is very out of default patterns. 我的问题是HTML文件已超出默认模式。 Examples: 例子:

    <link rel=File-List href="040602_inds_files/filelist.xml">

    <meta http-equiv=Content-Type content="text/html; charset=windows-1252">

The first one has no close tag (iText crashes) and the second one has no double quotes on 'http-equiv' value (Flying-Saucer crashes). 第一个没有关闭标记(iText崩溃),第二个没有对'http-equiv'值加双引号(Flying-Saucer崩溃)。

I have found a lot of posts about this issue, but all of them are handling their own HTML, so they can fix it and try again. 我发现了很多有关此问题的帖子,但是他们所有人都在处理自己的HTML,因此他们可以对其进行修复,然后重试。 But i can't do this. 但是我做不到。

This is the page I'm trying to convert. 是我要转换的页面。

Here is my iText convert method: 这是我的iText转换方法:

        public static void convert(PdfWriter writer, Document document, String siteUrl) throws MalformedURLException, IOException {
            XMLWorkerHelper.getInstance().parseXHtml(writer, document,
                    new BufferedReader(new InputStreamReader(new URL(siteUrl).openStream())));
        }

And here is my Flying-Saucer convert method: 这是我的飞碟转换方法:

        public static void convertFS(String siteUrl, String fileName) throws com.lowagie.text.DocumentException, IOException {
            OutputStream os = new FileOutputStream(fileName);
            ITextRenderer renderer = new ITextRenderer();
            renderer.setDocument(siteUrl);
            renderer.layout();
            renderer.createPDF(os);

            os.close();
        }

Any tips? 有小费吗? I accept other libs if they are decently usable. 我接受其他库,只要它们能很好地使用。 Thx in advance. 提前谢谢。

您可以先通过jsoup解析HTML文件,然后将内容转换为标准HTML文件,最后可以使用iText生成PDF

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM