如何使用 Python 中的表格从 .docx 中提取文本

Question

我拥有的 .docx 文件有表格、标题等，我想知道如何从该文档中提取文本。 我能找到的唯一示例代码使用段落，它不适用于我的文件。

这是代码：

    doc = docx.Document(self.filename)
    fullText = []
    for para in doc.paragraphs:
        txt = para.text.encode('ascii', 'ignore')
        fullText.append(txt)
    self.text = '\n'.join(fullText)

当我运行此代码时，我收到此错误：

 File "annotatorConnections.py", line 75, in openFile
    self.text = '\n'.join(fullText)
TypeError: sequence item 0: expected str instance, bytes found

Answer 1

由于您在全文中获得的是字节类型而不是字符串类型，因此您可以使用它来使其正常工作：

doc = docx.Document(self.filename)
fullText = []
for para in doc.paragraphs:
    txt = para.text.encode('ascii', 'ignore')
    fullText.append(txt)
self.text = b'\n'.join(fullText)             ---------> Add prefix b to make it a byte object.

如何使用 Python 中的表格从 .docx 中提取文本

问题描述

1 个解决方案

解决方案1
0 2020-04-12 05:03:18

如何使用 Python 中的表格从 .docx 中提取文本

问题描述

1 个解决方案

解决方案1 0 2020-04-12 05:03:18

解决方案1
0 2020-04-12 05:03:18