简体   繁体   English

使用Python从Word文档输出PCL

[英]Output PCL from Word document using Python

I'm building a web application which will include functionality that takes MS Word (and possibly input from a web-based rich text editor) documents, substitutes values into the formfield placeholders in those documents, and generates a PCL document as output. 我正在构建一个Web应用程序,其中包括使用MS Word(可能来自基于Web的富文本编辑器输入)文档的功能,将值替换为这些文档中的表单字段占位符,并生成PCL文档作为输出。

I'm developing in python and django on windows, but this whole solution will need to be deployed to a web host (yet to be chosen), which in practice means that the solution will need to run on linux. 我正在使用Windows上的python和django进行开发,但是整个解决方案需要部署到Web主机(尚未选择),这实际上意味着解决方案需要在Linux上运行。

I'm open to linux-only solutions if that's the only way. 如果这是唯一的方法,我对仅使用linux的解决方案持开放态度。 I'm open to solutions that involve talking to a server written in another language. 我愿意接触涉及用另一种语言编写的服务器的解决方案。 I am able to write C++ or java if necessary to get this done. 如果有必要,我可以编写C ++或java来完成这项工作。 The final output does have to be in PCL format. 最终的输出必须在PCL格式。

My question is: what is a good tool chain for generating PCL from word documents using python ? 我的问题是: 使用python从word文档生成PCL的好工具链什么

I am considering using some kind of interface to openoffice to open the word documents, do the substitutions, and send the output to some kind of printer driver. 我正在考虑使用某种界面openoffice打开word文档,进行替换,并将输出发送到某种打印机驱动程序。 Does anyone have experience with this? 有人对此有经验吗? What libraries would you recommend? 你会推荐哪些图书馆?

Options for interfacing that I have identified include the following; 我已经确定的接口选项包括以下内容: any other suggestions would be greatly welcomed: 任何其他建议将受到极大欢迎:

A second approach would be to use something like paradocx ( https://bitbucket.org/yougov/paradocx/wiki/Home ) to open the word files, do the substitutions using that in python, then somehow interface with something that can output PCL. 第二种方法是使用像paradocx( https://bitbucket.org/yougov/paradocx/wiki/Home )这样的东西来打开单词文件,在python中使用它进行替换,然后以某种方式与可以输出PCL的东西进行交互。 Again, any experience or comments on this approach would be appreciated. 同样,任何有关这种方法的经验或评论都将受到赞赏。

I will very much appreciate any comments on tools and toolchains, and ideas or recipes that you may have. 我非常感谢您对工具和工具链以及您可能拥有的想法或食谱的任何评论。

This question covers similar ground to, but is not the same as: How to Create PCL file from MS word 这个问题涵盖了类似的内容,但与以下内容不同: 如何从MS word创建PCL文件

Ghostscript can read PS (Postscript) or PDF and create PCL. Ghostscript可以读取PS(Postscript)或PDF并创建PCL。 You can use python libraries or just subprocess.... 你可以使用python库或只是子进程....

OK, so my final solution involved creating a java webservice to perform my transcoding. 好的,所以我的最终解决方案是创建一个java webservice来执行我的转码。

  • Docx4j provides a class org.docx4j.convert.out.pdf.viaXSLFO.Conversion which hooks into apache FOP to convert Docx to PDF; Docx4j提供了一个org.docx4j.convert.out.pdf.viaXSLFO.Conversion类,它挂钩到apache FOP以将Docx转换为PDF; that can be easily hacked to convert to PCL (because FOP outputs PCL) 可以很容易地被黑客攻击转换为PCL(因为FOP输出PCL)
  • Spark is a lightweight java web framework which allowed me to wrap my transcoder in a web service Spark是一个轻量级的Java Web框架,它允许我将我的代码转换器包装在Web服务中
  • Because I also manipulate the document, I need to have some metadata, so the perfect thing is a multipart form. 因为我也操纵文档,我需要一些元数据,所以完美的东西是多部分形式。 I decode that using Apache Fileupload 我使用Apache Fileupload解码

In almost all cases, I had to upgrade to the development versions of libraries to get this to work. 在几乎所有情况下,我都必须升级到库的开发版本才能使其工作。

On the python side I use: 在python方面我使用:

  • requests to communicate with the web service requests与Web服务通信
  • poster to prepare the multi-part request 准备多部分请求的poster

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM