简体   繁体   English

如何将 PDF InputStream 转换为 Html 字符串?

[英]How to convert a PDF InputStream to an Html String?

I have a PDF InputStream which is type ByteArrayInputStream我有一个类型为 ByteArrayInputStream 的 PDF InputStream

I need convert this input to html string.我需要将此输入转换为 html 字符串。

Is it possible or not?可不可以?

Thank you...谢谢...

One possible point to start is using pdf2dom .一个可能的起点是使用pdf2dom Please have a look here for how to integrate the dependencies into your project and to read more about possible required dependcies.请在此处查看如何将依赖项集成到您的项目中,并阅读有关可能需要的依赖项的更多信息。

Pdf2Dom provides a PDF parser that converts the documents to an HTML DOM representation. Pdf2Dom 提供了一个 PDF 解析器,可以将文档转换为 HTML DOM 表示。 This DOM tree then can be serialized to an HTML file or used for further processing.然后可以将该 DOM 树序列化为 HTML 文件或用于进一步处理。

Here's a small code example, I tried it and it worked well:这是一个小代码示例,我试过了并且效果很好:

    private void convert() {
    try {
        PDDocument pdf = PDDocument.load(new File(SOURCE_PDF));
        PDFDomTree parser = new PDFDomTree(PDFDomTreeConfig.createDefaultConfig());
        Writer output = new PrintWriter(TARGET_HTML, "UTF-8");
        parser.writeText(pdf, output);
        output.close();
        pdf.close();
    } catch (IOException | ParserConfigurationException e) {
        // Handle errors
    } 
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM