[英]How to open a pdf file in ms word through python
I can easily left click on a word file and then say open with word.我可以轻松地左键单击一个 word 文件,然后说 open with word。 Ms word automatically converts the pdf sufficiently to a docx, without the formatting(I don't need the formatting). Ms word 自动将 pdf 充分转换为 docx,无需格式化(我不需要格式化)。 I want to automatically open a batch of pdf files and save them to another folder as docx type (preferably through python).我想自动打开一批 pdf 文件并将它们作为 docx 类型保存到另一个文件夹(最好通过 python)。 Any suggestions on how to do this?关于如何做到这一点的任何建议?
I have tried python libraries like pypdf2, but they do not get all the content of the document.我已经尝试过像 pypdf2 这样的 python 库,但它们没有得到文档的所有内容。 I am currently having to manually open the pdf file in ms word, then save it, and open and process it using python.我目前必须以 ms word 手动打开 pdf 文件,然后保存它,然后使用 python 打开和处理它。
An easy solution would be use of os.system like this:一个简单的解决方案是像这样使用 os.system :
import os
os.system("'Path_to_your_word_exe' 'path_to_your_p df'")
There is a problem with spaces within a path with this solution, so I recommend using sub-process call:此解决方案的路径中存在空格问题,因此我建议使用子进程调用:
import subprocesss
subprocess.call([r'raw path to word', r'raw path to file'])
example:例子:
subprocess.call([r'C:\Program Files\Microsoft Office\root\Office16\WINWORD.exe', r'C:\Users\gopco\Downloads\SCYR.pdf'])
To automate your job for multiple files in a single directory use following code:要为单个目录中的多个文件自动执行作业,请使用以下代码:
import win32com.client
import os
#start word
word = win32com.client.Dispatch("Word.Application")
#allow word to print error messages (if any)
word.visible = 1
pdfs_path = "./" # folder with pdfs
reqs_path = "./" # folder for saving docx files
for i, doc in enumerate(glob.iglob(pdfs_path+"*.pdf")):
filename = doc.split('\\')[-1] #get just the file name
in_file = os.path.abspath(doc) #absolute path
print(in_file)
wb = word.Documents.Open(in_file) #open the pdf in word
out_file = os.path.abspath(reqs_path +filename[0:-4]+ ".docx".format(i)) #set the filename for saving the docx
print("outfile\n",out_file)
wb.SaveAs2(out_file, FileFormat=16) # file format for docx
print("success...")
wb.Close()
word.Quit()```
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.