简体   繁体   中英

How to extract text from paragraphs and table using python module from word document having objects excel sheet?

How to extract only text from paragraphs and table using python module from word document having objects like hyperlinks, images, attached excel sheet?

I tried docx2python but it only works for simple "docx" files and not for which have links or excel file attached inside of them.

Would this work?

import docx 

doc = docx.Document(FILEPATH)

text = []

for i in range(num_of_pargrphs): 
    line = [run.text for run in doc.paragraphs[i].runs]
    if line != []:
        # If you need a list of paragraphs        
        # text.append(line)

        result = ''.join(line)

# Printing out final results

print(result)

Also maybe for reading tables in documents you can use this: https://github.com/gressa-cpu/Python-Code-to-Share/blob/main/read_word_table.py

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM