简体   繁体   中英

How to read .rtf file and convert into python3 strings and can be stored in python3 list?

I am having a .rtf file and I want to read the file and store strings into list using python3 by using any package but it should be compatible with both Windows and Linux.

I have tried striprtf but read_rtf is not working.

from striprtf.striprtf import rtf_to_text
from striprtf.striprtf import read_rtf
rtf = read_rtf("file.rtf")
text = rtf_to_text(rtf)
print(text)

But in this code, the error is: cannot import name 'read_rtf'

Please can anyone suggest any way to get strings from .rtf file in python3?

Have you tried this?

with open('yourfile.rtf', 'r') as file:
    text = file.read()
print(text)

For a super large file, try this:

with open("yourfile.rtf") as infile:
    for line in infile:
        do_something_with(line)

Try using this:

from striprtf.striprtf import rtf_to_text

sample_text = "any text as a string you want"
text = rtf_to_text(sample_text)

Reading RTF file and manipulating the data inside that is tricky, it is depending upon the file you have, Hence I have tried all the above nothing worked, finally, the following code worked for me. Hope it will help those who are hunting for the solution.

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\power.rtf' 
doc = word.Documents.Open(FileName=path, Encoding='gbk')
 
for para in doc.paragraphs:
    print(para.Range.Text)
 
doc.Close()
word.Quit()

If you want to store in a single variable, the following code will solve the problem.

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\output_5.rtf' # Write absolute path, relative path will dial wrong
doc = word.Documents.Open(FileName=path, Encoding='gbk')

#for para in doc.paragraphs:
#    print(para.Range.Text)


content = '\n'.join([para.Range.Text for para in doc.paragraphs])

print(content)

doc.Close()
word.Quit()

Using rtf_to_text is enought to convsert rtf into string in Python. read content from a rtf file and feed it to the rtf_to_text

from striprtf.striprtf import rtf_to_text
with open("yourfile.rtf") as infile:
    content = infile.read()
    text = rtf_to_text(content)
print(text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM