I'm trying to get section number before each paragraph. But the weird thing is when I using textract to get txt from some docx. The numbers are ignored. Is there a way to get these numbers back? EX: 1.Term. XXXXXXXXXXXXXXend
I only got 'Term. XXXXXXXXXXXXXXend' in txt. I guess when these section are inputed within word's numbering feature, they will be ignored
text = textract.process(url, extension='docx')
strText = text.decode("utf8")
children = strText.split('\n\n')
Thanks in advance
Yes, you're hypothesis is correct. The section numbers are not actually stored in the document, they are computed and displayed at runtime only.
The only way to get them is to keep track yourself based on what may be the style of those paragraphs, something like 'Heading 1' and 'Heading 2' etc. It's possible for them to be assigned other ways which make it more difficult, but often it's done with headings since that's so easy for the author.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.