简体   繁体   English

如何通过 python 在 ms 字中打开 pdf 文件

[英]How to open a pdf file in ms word through python

I can easily left click on a word file and then say open with word.我可以轻松地左键单击一个 word 文件,然后说 open with word。 Ms word automatically converts the pdf sufficiently to a docx, without the formatting(I don't need the formatting). Ms word 自动将 pdf 充分转换为 docx,无需格式化(我不需要格式化)。 I want to automatically open a batch of pdf files and save them to another folder as docx type (preferably through python).我想自动打开一批 pdf 文件并将它们作为 docx 类型保存到另一个文件夹(最好通过 python)。 Any suggestions on how to do this?关于如何做到这一点的任何建议?

I have tried python libraries like pypdf2, but they do not get all the content of the document.我已经尝试过像 pypdf2 这样的 python 库,但它们没有得到文档的所有内容。 I am currently having to manually open the pdf file in ms word, then save it, and open and process it using python.我目前必须以 ms word 手动打开 pdf 文件,然后保存它,然后使用 python 打开和处理它。

An easy solution would be use of os.system like this:一个简单的解决方案是像这样使用 os.system :

import os
os.system("'Path_to_your_word_exe' 'path_to_your_p df'")

There is a problem with spaces within a path with this solution, so I recommend using sub-process call:此解决方案的路径中存在空格问题,因此我建议使用子进程调用:

import subprocesss
subprocess.call([r'raw path to word', r'raw path to file'])

example:例子:

subprocess.call([r'C:\Program Files\Microsoft Office\root\Office16\WINWORD.exe', r'C:\Users\gopco\Downloads\SCYR.pdf'])

To automate your job for multiple files in a single directory use following code:要为单个目录中的多个文件自动执行作业,请使用以下代码:

import win32com.client
import os

#start word
word = win32com.client.Dispatch("Word.Application")
#allow word to print error messages (if any)
word.visible = 1

pdfs_path = "./" # folder with pdfs
reqs_path = "./" # folder for saving docx files

for i, doc in enumerate(glob.iglob(pdfs_path+"*.pdf")):
    filename = doc.split('\\')[-1] #get just the file name
    in_file = os.path.abspath(doc) #absolute path
    print(in_file)
    wb = word.Documents.Open(in_file) #open the pdf in word
    out_file = os.path.abspath(reqs_path +filename[0:-4]+ ".docx".format(i)) #set the filename for saving the docx
    print("outfile\n",out_file)
    wb.SaveAs2(out_file, FileFormat=16) # file format for docx
    print("success...")
    wb.Close()

word.Quit()```

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用python与MS Word打开pdf文件时,如何抑制“ Microsoft PDF Reflow已停止工作”错误? - How to suppress the “Microsoft PDF Reflow has stopped working” error when using python to open pdf file with MS Word? 如何使用python将word文件转换为pdf? - how to convert word file into pdf using python? 如何通过python打开文件 - How to Open a file through python Python 通过 python 在 MS 字中打印和(和)字符 - Python printing & (and) character in MS word through python 使用Python更新MS Word(或Open Office)书签 - Updating MS Word (or Open Office) bookmarks with Python 如何在 Python 中使用 Docx 打开 PDF 文件? - How to open PDF file with Docx in Python? 通过python,在PDF文件中用另一个词替换一个词,用另一个图像替换一个图像,这可能吗? - Replacing a word with another word, and replacing an image with another image in a PDF file through python, is this possible? 如何在python中检索MS WORD(2003)doc文件 - How to retrieve MS WORD(2003) doc file in python 如何使用Python读取MS-Word文件中的表的内容? - How to read contents of an Table in MS-Word file Using Python? 如何将条件语句的结果输​​出到Python中的MS Word文件? - How to output results from a conditional statement to a MS Word file in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM