简体   繁体   中英

Can we convert PDF files to HTML using C,C++,or Java (any language)?

I need to convert PDF files into HTML files (IOS platform) so that I can annotate the HTML page using Javascript. I had some success in annotating HTML pages, so if I can convert PDF to HTML I can complete my task. How can I do the conversion?

Converting FROM PDF is generally Very Hard (at best).

PDF contains drawing instructions. "Line from here to there", "these characters at these coordinates". There's usually no information about the logical meaning of these lines, characters, and images, though "Document Structure" is becoming more common.

Without "document structure" and "marked content" it is Very Hard to go from "a pile of lines and characters" to "a table with this information in these columns and rows".

Not impossible, just Very Hard.

And people who have worked on this problem aren't all that interested in sharing their code for free.

It will be hard to convert any pdf, some of them are too complicated for HTML.

Take a look at libpoppler it has already pdf2html functions and it is open source, you can always extends it, so it fits yours requirements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM