简体   繁体   中英

How to convert RTF to Docx in Python

I'm using doxygen to generate rtf output, but I need the rtf converted to docx in an automated way using python to run on a build system.

  • Input: example.rtf

  • Output: example.docx

I don't want to change any of the styling, formatting, or content. Just do a direct conversion. The same way it would be done manually by opening the.rtf in word and then doing SaveAs.docx

I spent a lot of time and energy trying to figure this out, so I thought I would post the question and solution for the community. It was actually very simple in the end, but took a long time to find the correct information to accomplish this.

This solution requires the following:

  • Python 3.x
  • PyWin32 module
  • Windows 10 environment (haven't tried it with other flavors of Windows)
#convert rtf to docx and embed all pictures in the final document
def ConvertRtfToDocx(rootDir, file):
    word = win32com.client.Dispatch("Word.Application")
    wdFormatDocumentDefault = 16
    wdHeaderFooterPrimary = 1
    doc = word.Documents.Open(rootDir + "\\" + file)
    for pic in doc.InlineShapes:
        pic.LinkFormat.SavePictureWithDocument = True
    for hPic in doc.sections(1).headers(wdHeaderFooterPrimary).Range.InlineShapes:
        hPic.LinkFormat.SavePictureWithDocument = True
    doc.SaveAs(str(rootDir + "\\refman.docx"), FileFormat=wdFormatDocumentDefault)
    doc.Close()
    word.Quit()

As rtf can't have embedded images, this also takes any images from RTF and embeds them into the resulting word docx, so there are no external image reference dependencies.

It worked for me to convert.rtf ->.doc ->.docx (Optionally I attached Step 3 to convert to.pdf)

You need to pip install:

pip install pywin32
pip install docx2pdf (optional if you need to have pdf)

Step 1: convert.rtf ->.doc

from glob import glob
import re
import os
import win32com.client as win32
from win32com.client import constants


def rtf_to_doc(file_path):
    word = win32.gencache.EnsureDispatch('Word.Application')
    doc = word.Documents.Open(file_path)
    doc.Activate()

    # Rename path with .doc
    new_file_abs = os.path.abspath(file_path)
    new_file_abs = re.sub(r'\.\w+$', '.doc', new_file_abs)

    # Save and Close
    word.ActiveDocument.SaveAs(
        new_file_abs, FileFormat=constants.wdFormatDocument
    )
    doc.Close(False)

Step 2: convert.doc ->.docx

import re
import os
from win32com.client import constants
import win32com.client as win32

def doc_to_docx(file_path):
    word = win32.gencache.EnsureDispatch('Word.Application')
    doc = word.Documents.Open(file_path)
    doc.Activate()

    # Rename path with .docx
    new_file_abs = os.path.abspath(file_path)
    new_file_abs = re.sub(r'\.\w+$', '.docx', new_file_abs)

    # Save and Close
    word.ActiveDocument.SaveAs(
        new_file_abs, FileFormat=constants.wdFormatXMLDocument
    )
    doc.Close(False)

Step 3 (optional if PDF is needed).docx ->.pdf

from docx2pdf import convert

def docx_to_pdf(ori_file, new_file):
     convert(ori_file, new_file)
    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM