简体   繁体   中英

How to extract textbox & flowcharts from Word( doc,docx) files using Python?

I am preparing automation code to extract flow diagrams,shapes (non-image) along with text present inside them from word files(doc,docx).

I tried using rIds (relationship ids) to identify them with python but failed. Can anyone suggest better solution?

TIA

You can use the PyDocX library to convert the docx to html and then extract the images using the tags. The conversion is pretty good, you can give it a try

PyDocX.to_html(directory+'hey.docx')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM