如何使用python-docx在Word文档中增加段落对象？

Question

I'm searching word documents to get descriptions of things that are written in the docs. 我正在搜索Word文档以获取对文档中编写的内容的描述。 However, these docs are not all formatted the same. 但是，这些文档的格式并非全部相同。 But one thing that is consistent is the text block I want is always after the title 'Description'. 但一致的一件事是，我想要的文本块始终位于标题“描述”之后。 So I'd search for 'Description' then hope to get the text of the next paragraph object after it. 因此，我将搜索“描述”，然后希望在其后获取下一个段落对象的文本。 How an I increment the paragraph object (so to speak)? 我如何增加段落对象（可以这么说）？

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        doc = docx.Document(os.path.join(rootdir, file))
        for paragraph in doc.paragraphs:
            if 'Description' in paragraph.text:
                print(paragraph[i+1].text) #I know you can't do i+1 but
                                           #that's essentially what I want to do

Answer 1

A simple approach would be: 一种简单的方法是：

paragraphs = list(doc.paragraphs)

for i in range(len(paragraphs)):
    paragraph = paragraphs[i]
    if 'Description' in paragraph.text:
        print(paragraphs[i+1].text)

If you know for sure that the description label appears in a paragraph with Heading 1 style, you could further qualify heading paragraphs so you don't get false positives on a paragraph that just happens to use that word. 如果您确定说明标签出现在Heading 1的段落中，则可以进一步限定标题段落，以免在恰好使用该词的段落中出现误报。

Answer 2

If you're looking to extract text and search that way, python-docx2txt will give you less headaches. 如果您希望提取文本并以这种方式进行搜索，则python-docx2txt会减轻您的麻烦。 It was adapted from python-docx. 它改编自python-docx。

如何使用python-docx在Word文档中增加段落对象？

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-06-30 22:41:03

解决方案2
1 2016-06-30 20:53:24

如何使用python-docx在Word文档中增加段落对象？

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-06-30 22:41:03

解决方案2 1 2016-06-30 20:53:24

解决方案1
2 已采纳 2016-06-30 22:41:03

解决方案2
1 2016-06-30 20:53:24