[英]Can we convert PDF files to HTML using C,C++,or Java (any language)?
I need to convert PDF files into HTML files (IOS platform) so that I can annotate the HTML page using Javascript. 我需要将PDF文件转换为HTML文件(IOS平台),以便我可以使用Javascript注释HTML页面。 I had some success in annotating HTML pages, so if I can convert PDF to HTML I can complete my task.
我在注释HTML页面方面取得了一些成功,所以如果我可以将PDF转换为HTML,我就可以完成我的任务。 How can I do the conversion?
我该如何进行转换?
Converting FROM PDF is generally Very Hard (at best). 转换FROM PDF通常很难(充其量)。
PDF contains drawing instructions. PDF包含绘图说明。 "Line from here to there", "these characters at these coordinates".
“从这里到那里的线”,“这些坐标处的这些字符”。 There's usually no information about the logical meaning of these lines, characters, and images, though "Document Structure" is becoming more common.
虽然“文档结构”变得越来越普遍,但通常没有关于这些行,字符和图像的逻辑含义的信息。
Without "document structure" and "marked content" it is Very Hard to go from "a pile of lines and characters" to "a table with this information in these columns and rows". 如果没有“文档结构”和“标记内容”,很难从“一堆行和字符”转到“在这些列和行中包含此信息的表”。
Not impossible, just Very Hard. 并非不可能,只是很难。
And people who have worked on this problem aren't all that interested in sharing their code for free. 那些解决这个问题的人并不是都对免费共享代码感兴趣。
It will be hard to convert any pdf, some of them are too complicated for HTML. 转换任何pdf都很难,其中一些对于HTML来说太复杂了。
Take a look at libpoppler it has already pdf2html functions and it is open source, you can always extends it, so it fits yours requirements. 看看libpoppler它已经有pdf2html函数,它是开源的,你可以随时扩展它,所以它符合你的要求。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.