简体   繁体   中英

Extract Page 1 Header from Word Doc

I'm trying to extract multiple lines of text from the page 1 Header of an MS Word Document (.docx). I'm using python.docx but can't determine how specific I need to get in order to get only the 1st page header.

Code is currently:

from docx import Document
document = Document("path.docx")
section = document.sections[0]
header = section.header
print(header.paragraphs[0].text)

With the output: "Name of File; Smith; Page"

Screenshots linked for the content I'm referring to with Headers versus Running Header. I want the Header, I don't care about the Running Header: Header 1 Running Header

Any help appreciated! I've looked at the documentation for headers in general ( https://python-docx.readthedocs.io/en/latest/user/hdrftr.html ) but it does not go into specifics for dealing with the Different First Page Header feature of MS Word.

In Word, each section has three headers and three footers.

They are not by page but there is the primary (odd-page) header, the even-page header, and the first-page header.

There is no Sections(0), the number starts with 1. Every document has at least one section. Here is my web page on sections if you need more about them and headers and footers.

The header on the first page will be either the first-page header of Section 1 or the primary header of Section 1. The code for the primary is Activedocument.Sections(1).Headers(wdHeaderFooterPrimary).Range.Text ; that for the first-page is Activedocument.Sections(1).Headers(wdHeaderFooterFirstPage).Range.Text .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM