维护文档标题层次结构python-docx

Question

I am developing algorithms for extracting sections of a Docx file while maintaining document structure I managed to get headings but How do I go about getting the data between headers and maintain header hierarchy: This is what I have done so far. 我正在开发用于在维护文档结构的同时提取Docx文件各节的算法，但我设法获得标题，但如何在标头之间获取数据并维护标头层次结构：这是我到目前为止所做的。

Sample Code : 样例代码：

from docx import Document
document=Document('headerEX.docx')
paragraphs=document.paragraphs
 def iter_headings(paragraphs):
      for paragraph in paragraphs:
          if paragraph.style.name.startswith('Heading'):
                yield paragraph

for heading in iter_headings(document.paragraphs):
   print (heading.text)

Answer 1

Something like this should give you a start: 这样的事情应该给您一个开始：

sections = []
section_heading = None
section_paragraphs = []
for paragraph in paragraph:
    if paragraph.style.name.startswith('Heading'):
        section = {
            'heading': section_heading,
            'paragraphs': section_paragraphs
        }
        sections.append(section)
        section_heading = paragraph.text
        section_paragraphs = []
        continue
    section_paragraphs.append(paragraph)

for section in sections:
    print(section['heading'])
    for paragraph in section['paragraphs']:
        print(paragraph.text)

As written, this may give you an empty section extract as the first one, and will not capture the last section. 如所写，这可能会为您提供一个空白部分作为第一个部分的摘录，而不会捕获最后一个部分。 I leave those details to you as an exercise to strengthen your coding skills :) 我将这些细节留给您作为练习，以增强您的编码技巧：）

维护文档标题层次结构python-docx

问题描述

1 个解决方案

解决方案1
0 2018-04-17 19:42:56

维护文档标题层次结构python-docx

问题描述

1 个解决方案

解决方案1 0 2018-04-17 19:42:56

解决方案1
0 2018-04-17 19:42:56