简体   繁体   English

彻底混淆使用.doc API

[英]Thoroughly cofused about using .doc APIs

Let me start off by saying my python knowledge is beginner-to-intermediate level, and I recently started using the language again after a long time. 首先,我要说我的python知识是初学者到中级的水平,最近很长时间之后我又重新开始使用该语言。

The Goal: 目标:

This morning I came across a bunch of word documents I wanted to convert and concatenate to PDF files, with 2 .doc files creating one PDF. 今天早上,我遇到了很多我想转换并连接成PDF文件的Word文档,其中2个.doc文件创建了一个PDF。 seemed like a fairly trivial task, so I figured I'd try to learn how to do it in python. 似乎是一个相当琐碎的任务,所以我想我将尝试学习如何在python中进行操作。 concatenating PDFs wasn't too bad, I found PyPDF2 and managed to write a script that did just that. 连接PDF还不错,我找到了PyPDF2并设法编写了一个脚本来做到这一点。

But 7 hours later, after countless scripts with broken dependencies- I still can't find a way to automate the doc-pdf conversion . 但是7小时后,在无数依赖破坏的脚本之后,我仍然找不到自动化doc-pdf转换的方法

The Problem(s): 问题:

every script I found either: 我找到的每个脚本之一:

  1. uses python-docx (my documents are word 2003 .docs ) 使用python-docx(我的文档是word 2003 .docs
  2. uses unoconv bridge (which I installed along with OpenOffice, then searched around for documentation but found none- thus I have no idea how to call from a python script or the shell. I saw one example for this but it keeps throwing errors) 使用unoconv桥(我将它与OpenOffice一起安装,然后在文档中进行搜索,但没有找到文件-因此,我不知道如何从python脚本或shell调用。我看到了一个示例 ,但它不断抛出错误)
  3. uses win32com or win32com.client or pywin32 or somesuch. 使用win32com或win32com.client或pywin32等。 I ran into numerous issues with these- installed one but couldn't import it from code (as happened to the guy here ), now I can't even find them with pip. 我在安装这些程序时遇到了许多问题,但是无法从代码中导入(就像这里的那个家伙一样),现在我什至无法用pip找到它们。 searched for documentation for them (are they modules or classes? I have no idea) and found practically nothing that I could understand, beyond that they're connected to ActivePython. 在为它们搜索文档(它们是模块还是类?我不知道),发现除了它们已连接到ActivePython之外,几乎没有我能理解的东西。 (which is apparantly a superset of Python with more capabilities?). (这显然是具有更多功能的Python的超集?)。
  4. Uses comtypes , which I installed but was unable to use/import either for some reason (maybe I'm using pip wrong somehow?) 使用comtypes ,我安装了comtypes ,但是由于某种原因而无法使用/导入(也许我以某种方式使用了pip错误?)

I know my question is hardly focused but honestly by now my brain is fried from information overload. 我知道我的问题几乎没有集中讨论,但老实说,现在我的大脑因信息超载而烦恼。 any simplifications for a noob would be more than welcome. 对于菜鸟的任何简化都将受到欢迎。

TL;DR: TL; DR:

assuming no knowledge of COM stuff and little experience with any external frameworks: 假设不具备COM知识,并且没有任何外部框架的经验:

  1. what would I have to do to convert Word 2003 .doc files to .pdf files? 我需要怎么做才能将Word 2003 .doc文件转换为.pdf文件? I'm running python3.5.1 32-bit on a Windows 10 64-bit machine. 我在Windows 10 64位计算机上运行python3.5.1 32位。
  2. where can I learn more about accessing other software APIs from python? 在哪里可以了解有关从python访问其他软件API的更多信息? are there big prerequisites for this stuff like knowing how the OS works on a lower level? 这些东西是否有很大的先决条件,例如了解操作系统在较低级别的工作方式?

Thanks! 谢谢!

From my experience, converting between the various office formats is best done outside of python. 根据我的经验,最好在python之外完成各种办公格式之间的转换。 With the subprocess module, you can call the external command 使用子流程模块,您可以调用外部命令

soffice --convert-to pdf file.doc  --headless

where soffice is the command that comes with LibreOffice. soffice是LibreOffice随附的命令。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM