[英]Convert PDF to HTML without losing any format
I'm developing a Python Flask webapp and I'm trying to convert some user uploaded pdfs to nicely formatted HTML, like the HTML that is being produced when you display a pdf inside an iframe
.我正在开发一个 Python Flask webapp,我正在尝试将一些用户上传的 pdf 转换为格式良好的 HTML,例如在
iframe
显示 pdf 时生成的 HTML。
I tried several things so far:到目前为止,我尝试了几件事:
pdfminer.six
library, produced messy HTML, pdfminer.six
库,产生了凌乱的 HTML,pdf2htmlEX
( https://github.com/pdf2htmlEX/pdf2htmlEX ) which produced exactly what I wanted.pdf2htmlEX
( https://github.com/pdf2htmlEX/pdf2htmlEX ),它产生了我想要的东西。 Locally, this solution worked great, however in the production state (Heroku) I was unable to install it correctly.在本地,此解决方案效果很好,但是在生产状态 (Heroku) 中,我无法正确安装它。 The project is deprecated and the documentation is limited and terrible.
该项目已被弃用,文档有限且糟糕。 The problem has something to do with broken dependencies.
这个问题与破坏的依赖关系有关。
So, how to convert PDFs to HTML effectively without losing any format using Python or any other tool?那么,如何使用 Python 或任何其他工具有效地将 PDF 转换为 HTML 而不会丢失任何格式?
Thanks a lots.非常感谢。
if anyone is willing to help me getting the pdf2htmlEX
to work on heroku, leave a comment and I will post more details in a different post如果有人愿意帮助我让
pdf2htmlEX
在 heroku 上工作,请发表评论,我将在不同的帖子中发布更多详细信息
This is not going to be trivial.这不会是微不足道的。 But I'll give some pointers.
但我会给出一些指示。
You need an app.json
in which you define your buildpacks.您需要一个
app.json
来定义您的 buildpack。
https://devcenter.heroku.com/articles/app-json-schema#buildpacks https://devcenter.heroku.com/articles/app-json-schema#buildpacks
If this project is available via apt
it's going to be easy.如果这个项目可以通过
apt
那就很容易了。 You just use the Heroku's Apt buildpack define an Aptfile
that says which packages it needs to install.您只需使用Heroku 的 Apt buildpack定义一个
Aptfile
,说明它需要安装哪些包。 Example例子
Then it installs it automatically and you are done.然后它会自动安装它,你就完成了。
If it is not available as a package you will need to create your own buildpack.如果它不能作为包提供,您将需要创建自己的 buildpack。
https://devcenter.heroku.com/articles/buildpack-api https://devcenter.heroku.com/articles/buildpack-api
Example used here . 此处使用的示例。
Another solution is to dockerize your project and execute it as a docker container.另一种解决方案是将您的项目 dockerize 并将其作为 docker 容器执行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.