简体   繁体   English

从 python 中的 word 文件读取

[英]Read from a word file in python

How can I read from a word (docx) file in python.如何从 python 中的单词(docx)文件中读取。 I can read from a txt file but can not do the same for MS Office word document.我可以从 txt 文件中读取,但不能对 MS Office word 文档执行相同的操作。 Any suggestions?有什么建议么?

There are a couple of packages that let you do this.有几个包可以让你做到这一点。 Check查看

  1. python-docx .蟒蛇-docx

  2. docx2txt (note that it does not seem to work with .doc ). docx2txt (请注意,它似乎不适用于.doc )。 As per this , it seems to get more info than python-docx.据此,它似乎比 python-docx 获得更多信息。 From original documentation:从原始文档:

import docx2txt

# extract text
text = docx2txt.process("file.docx")

# extract text and write images in /tmp/img_dir
text = docx2txt.process("file.docx", "/tmp/img_dir") 
  1. textract (which works via docx2txt ). textract (通过docx2txt工作)。

  2. Since .docx files are simply .zip files with a changed extension, this shows how to access the contents.由于.docx文件只是具有更改扩展名的.zip文件, 因此这显示了如何访问内容。 This is a significant difference with .doc files, and the reason why some (or all) of the above do not work with .doc s.这是与.doc文件的显着差异,也是上述部分(或全部)不适用于.doc的原因。 In this case, you would likely have to convert doc -> docx first.在这种情况下,您可能必须先转换doc -> docx antiword is an option. antiword是一种选择。

See this library that allows for reading docx files https://python-docx.readthedocs.io/en/latest/请参阅允许读取 docx 文件的此库https://python-docx.readthedocs.io/en/latest/

You should use the python-docx library available on PyPi.您应该使用 PyPi 上可用的 python-docx 库。 Then you can use the following然后你可以使用以下

doc = docx.Document('myfile.docx')
allText = []
for docpara in doc.paragraphs:
    allText.append(docpara.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从python文件中读取单词中的数字 - Read a number in a word from a file in python 如何在Python中将文件中的每一行逐字读取到列表中 - How to read each line from a file into list word by word in Python Python:如何逐字读取标准输入/文件? - Python: How to I read from stdin/file word by word? 在python中读取文件中的下一个单词 - Read the next word in a file in python 如何从python中的最后一个单词到第一个单词读取txt文件上的单词(或字符)? - How to read word(or character) on txt file from the last word to the first word in python? 使用python将数据从csv文件读取到word文档作为表格 - read data from csv file to word document as a table using python 如何从 python 中的文本文件中读取单词/字符串? - How to read a word/string from a text file in python? 从一个特定的单词到另一个单词读取 python 中的文件并将其放入列表中 - Read a file in python from one particular word to another and put it in a list (从.txt文件读取的Pig-Latin转换器)如何将文件内容按行和单词逐字拆分? 蟒蛇 - (Pig-Latin converter read from .txt file) How to split contents of file into lines AND word by word? Python 如何读取文本文件并将其逐字写入python中的另一个文件? - How to read a text file and write it word by word into another file in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM