简体   繁体   English

如何使用python-docx在Word文档中增加段落对象?

[英]How to increment paragraph object in word document using python-docx?

I'm searching word documents to get descriptions of things that are written in the docs. 我正在搜索Word文档以获取对文档中编写的内容的描述。 However, these docs are not all formatted the same. 但是,这些文档的格式并非全部相同。 But one thing that is consistent is the text block I want is always after the title 'Description'. 但一致的一件事是,我想要的文本块始终位于标题“描述”之后。 So I'd search for 'Description' then hope to get the text of the next paragraph object after it. 因此,我将搜索“描述”,然后希望在其后获取下一个段落对象的文本。 How an I increment the paragraph object (so to speak)? 我如何增加段落对象(可以这么说)?

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        doc = docx.Document(os.path.join(rootdir, file))
        for paragraph in doc.paragraphs:
            if 'Description' in paragraph.text:
                print(paragraph[i+1].text) #I know you can't do i+1 but
                                           #that's essentially what I want to do

A simple approach would be: 一种简单的方法是:

paragraphs = list(doc.paragraphs)

for i in range(len(paragraphs)):
    paragraph = paragraphs[i]
    if 'Description' in paragraph.text:
        print(paragraphs[i+1].text)

If you know for sure that the description label appears in a paragraph with Heading 1 style, you could further qualify heading paragraphs so you don't get false positives on a paragraph that just happens to use that word. 如果您确定说明标签出现在Heading 1的段落中,则可以进一步限定标题段落,以免在恰好使用该词的段落中出现误报。

If you're looking to extract text and search that way, python-docx2txt will give you less headaches. 如果您希望提取文本并以这种方式进行搜索,则python-docx2txt减轻您的麻烦。 It was adapted from python-docx. 它改编自python-docx。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM