简体   繁体   中英

Convert HTML to PDF using Java

I have an HTML and wanted to convert it into in memory pdf but cannot find good library to convert HTML to PDF.

I have tried this using ITextRenderer and Jsoup but throwing exception Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 3; The markup in the document preceding the root element must be well-formed. Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 3; The markup in the document preceding the root element must be well-formed.

Here's my code

                Document document = Jsoup.parse(template, "UTF-8");
                document.outputSettings().syntax(Document.OutputSettings.Syntax.html);
                ByteArrayOutputStream binaryOutput = new ByteArrayOutputStream();
                renderer.setDocumentFromString(document.html());
                renderer.layout();
                renderer.createPDF(binaryOutput);

  

You are searching for a way to render HTML and store that as PDF. In this question people tried to render XML (which is close to HTML and definitely is XHTML) to get it ultimately into PDF: Java Render XML Document as PDF

But coming to your error message: That error is related to your input document which you did not show. The document preceeding the root element should/could look like this:

<?xml version="1.0"?>
<!-- comment -->
<?processinginstruction whatever parameters?>
<rootElement/>

So everything before <rootElement/> is what your error message is pointing to. I guess you are looking at an HTML document, and it may contain something that the JSoup HTML parser is struggling with. Unless you share that document with us you will have to figure it out yourself.

You can try to use this package: com.itextpdf.html2pdf.HtmlConverter

With this, all you have to do is: HtmlConverter.convertToPdf(tempFileHtml, tempFilePdf); And export it. It doesn't have a lot of problems with bad-formed xmls/htmls. I used it and I am happy with the results obtained :)

Popular tool to do a HTML to PDF conversion is IronPDF for Java (also for .NET).

With the addition of the following to pom.xml (changing the version to latest):

<dependencies>

    <dependency>
        <groupId>com.ironsoftware</groupId>
        <artifactId>ironpdf</artifactId>
        <version>2022.11.0</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <version>2.0.3</version>
    </dependency>

</dependencies>

I was able to render pixel-perfect PDFs that looked exactly like my HTML. An example is:

import com.ironsoftware.ironpdf.*;

// Render the HTML as a PDF. Stored in myPdf as type PdfDocument;
PdfDocument myPdf = PdfDocument.renderHtmlAsPdf("<h1> ~Hello World~ </h1> Made with IronPDF!");
 
// Save the PdfDocument to a file
myPdf.saveAs(Paths.get("html_saved.pdf"));

// Or with a local file:
myPdf = PdfDocument.renderHtmlFileAsPdf("example.html");
myPdf.saveAs(Paths.get("html_file_saved.pdf"));

// Even works with Webpages:
myPdf = PdfDocument.renderUrlAsPdf("https://ironpdf.com");
myPdf.saveAs(Paths.get("url.pdf"));

Disclaimer that I am affiliated with IronPDF and will be more than happy to answer any questions you may have with the software.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM