I am preparing automation code to extract flow diagrams,shapes (non-image) along with text present inside them from word files(doc,docx).
I tried using rIds
(relationship ids) to identify them with python but failed. Can anyone suggest better solution?
TIA
You can use the PyDocX library to convert the docx to html and then extract the images using the tags. The conversion is pretty good, you can give it a try
PyDocX.to_html(directory+'hey.docx')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.