How to extract textbox & flowcharts from Word( doc,docx) files using Python?

Question

I am preparing automation code to extract flow diagrams,shapes (non-image) along with text present inside them from word files(doc,docx).

I tried using rIds (relationship ids) to identify them with python but failed. Can anyone suggest better solution?

TIA

Answer 1

You can use the PyDocX library to convert the docx to html and then extract the images using the tags. The conversion is pretty good, you can give it a try

PyDocX.to_html(directory+'hey.docx')

How to extract textbox & flowcharts from Word( doc,docx) files using Python?

Question

1 answers

solution1
0 2019-11-26 10:19:53

How to extract textbox & flowcharts from Word( doc,docx) files using Python?

Question

1 answers

solution1 0 2019-11-26 10:19:53

solution1
0 2019-11-26 10:19:53