[英]Difficulties extracting XML from Word document with Python
I'm trying to extract the XML from a Word document with Python using the code found on this webpage.我正在尝试使用此网页上的代码从带有 Python 的 Word 文档中提取 XML。
I began by creating a test document named test.docx
.我首先创建了一个名为test.docx
的测试文档。 I then ran the following code:然后我运行了以下代码:
import zipfile
from lxml import etree
def getXml(docxFilename):
zip = zipfile.ZipFile(open(docxFilename))
xmlContent = zip.read("word/document.xml")
return xmlContent
def getXmlTree(xmlContent):
return etree.fromstring(xmlContent)
testXml = getXml("test.docx")
print(getXmlTree(testXml))
Running this code produced the error message "File is not a zip file".运行此代码会产生错误消息“文件不是 zip 文件”。 What did I do wrong?我做错了什么?
you need to pass the path of docx file as a argument, not particularly docx file.您需要将 docx 文件的路径作为参数传递,而不是特别是 docx 文件。 compress the file and make the path in zip format压缩文件并将路径设为 zip 格式
ex: "D:/Users/John/docs/data.zip"例如:“D:/Users/John/docs/data.zip”
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.