简体   繁体   English

需要有关在网络上显示(和/或转换)pdf文件的建议

[英]Need advice on displaying (and/or converting) pdf files on the web

First some background: My site has two basic types of users. 首先是一些背景:我的网站有两种基本类型的用户。 Users with free accounts can upload documents and paid customers can then search and view or download those documents. 拥有免费帐户的用户可以上传文档,然后付费客户可以搜索和查看或下载这些文档。 Uploaders can view only the documents they own while paid customers can view anything. 上传者只能查看他们拥有的文件,而付费用户可以查看任何内容。 Currently we only support Word documents (either .doc or .docx) and plain text. 目前我们只支持Word文档(.doc或.docx)和纯文本。 We use the JODConverter library to convert between Word and html; 我们使用JODConverter库在Word和html之间进行转换; the html is what's stored in the database and what's displayed to users. html是存储在数据库中的内容以及向用户显示的内容。
We want to move to accepting PDFs as well but I'm not sure what's the best way to go about either displaying the PDFs or converting them to html. 我们也希望接受PDF,但我不确定在显示PDF或将其转换为HTML时最好的方法是什么。 I've seen suggestions to use Google docs to do the conversion on the fly but it doesn't seem feasible to restrict access properly given that the document has to be publicly accessible to Google - please correct me if I'm wrong. 我已经看到了使用Google文档进行转换的建议,但是由于文档必须可以公开访问Google,因此限制访问似乎不太可行 - 如果我错了,请纠正我。 It seems like simply using an tag in the html (or something like PDFBox) would run into the same problem. 看起来简单地在html(或像PDFBox之类的东西)中使用标签会遇到同样的问题。
Alternatively we could forget displaying the PDF files directly and convert them into html like we do with Word documents but I've yet to come across a decent-looking library for that. 或者,我们可以忘记直接显示PDF文件并将其转换为html,就像我们使用Word文档一样,但我还没有遇到过看起来像样的图书馆。 Everything I've looked at so far seems to say it doesn't do that great of a job converting, is Window-only and/or has a hefty licensing fee. 到目前为止,我所看到的所有内容似乎都说它没有那么好的转换工作,仅限Window和/或有很高的许可费。 (A licensing fee isn't necessarily a deal-breaker if it's not more than $100 / year or so.) Does anyone know of a good Java conversion library? (如果每年不超过100美元左右,许可费不一定是交易破坏者。)有没有人知道一个好的Java转换库? (Something that runs via command-line would be acceptable if it actually does a good job.) (如果它确实做得很好,那么通过命令行运行的东西是可以接受的。)
One last thing, we plan to offer the paid customers the option to download the original PDF files. 最后,我们计划为付费客户提供下载原始PDF文件的选项。 Is that likely to be complicated? 那可能很复杂吗? Is there anything I should be keeping in mind when building the rest of the process? 在构建剩余的流程时,有什么我应该记住的吗?

Instead of converting PDF into HTML which means some kind of OCR (recognizing the text), you can convert the PDF into images via tools like JPedal and create a HTML page which links to those images in a sequential order. 您可以通过JPedal等工具将PDF转换为图像,并创建一个按顺序链接到这些图像的HTML页面,而不是将PDF转换为HTML,这意味着某种OCR(识别文本)。 Since this is java library, it's not windows only. 因为这是java库,所以它不仅仅是windows。

Downloading original PDF files shouldn't be a problem. 下载原始PDF文件应该不是问题。 You have to just set the mimetype to standard PDF extension: application/pdf in the header. 您必须在标题中将mimetype设置为标准PDF扩展名:application / pdf。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM