I'm using doxygen to generate rtf output, but I need the rtf converted to docx in an automated way using python to run on a build system.
Input: example.rtf
Output: example.docx
I don't want to change any of the styling, formatting, or content. Just do a direct conversion. The same way it would be done manually by opening the.rtf in word and then doing SaveAs.docx
I spent a lot of time and energy trying to figure this out, so I thought I would post the question and solution for the community. It was actually very simple in the end, but took a long time to find the correct information to accomplish this.
This solution requires the following:
#convert rtf to docx and embed all pictures in the final document
def ConvertRtfToDocx(rootDir, file):
word = win32com.client.Dispatch("Word.Application")
wdFormatDocumentDefault = 16
wdHeaderFooterPrimary = 1
doc = word.Documents.Open(rootDir + "\\" + file)
for pic in doc.InlineShapes:
pic.LinkFormat.SavePictureWithDocument = True
for hPic in doc.sections(1).headers(wdHeaderFooterPrimary).Range.InlineShapes:
hPic.LinkFormat.SavePictureWithDocument = True
doc.SaveAs(str(rootDir + "\\refman.docx"), FileFormat=wdFormatDocumentDefault)
doc.Close()
word.Quit()
As rtf can't have embedded images, this also takes any images from RTF and embeds them into the resulting word docx, so there are no external image reference dependencies.
It worked for me to convert.rtf ->.doc ->.docx (Optionally I attached Step 3 to convert to.pdf)
You need to pip install:
pip install pywin32
pip install docx2pdf (optional if you need to have pdf)
Step 1: convert.rtf ->.doc
from glob import glob
import re
import os
import win32com.client as win32
from win32com.client import constants
def rtf_to_doc(file_path):
word = win32.gencache.EnsureDispatch('Word.Application')
doc = word.Documents.Open(file_path)
doc.Activate()
# Rename path with .doc
new_file_abs = os.path.abspath(file_path)
new_file_abs = re.sub(r'\.\w+$', '.doc', new_file_abs)
# Save and Close
word.ActiveDocument.SaveAs(
new_file_abs, FileFormat=constants.wdFormatDocument
)
doc.Close(False)
Step 2: convert.doc ->.docx
import re
import os
from win32com.client import constants
import win32com.client as win32
def doc_to_docx(file_path):
word = win32.gencache.EnsureDispatch('Word.Application')
doc = word.Documents.Open(file_path)
doc.Activate()
# Rename path with .docx
new_file_abs = os.path.abspath(file_path)
new_file_abs = re.sub(r'\.\w+$', '.docx', new_file_abs)
# Save and Close
word.ActiveDocument.SaveAs(
new_file_abs, FileFormat=constants.wdFormatXMLDocument
)
doc.Close(False)
Step 3 (optional if PDF is needed).docx ->.pdf
from docx2pdf import convert
def docx_to_pdf(ori_file, new_file):
convert(ori_file, new_file)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.