简体   繁体   English

如何读取.rtf文件并转换为python3字符串并可以存储在python3列表中?

[英]How to read .rtf file and convert into python3 strings and can be stored in python3 list?

I am having a .rtf file and I want to read the file and store strings into list using python3 by using any package but it should be compatible with both Windows and Linux.我有一个 .rtf 文件,我想通过使用任何包使用 python3 读取文件并将字符串存储到列表中,但它应该与 Windows 和 Linux 兼容。

I have tried striprtf but read_rtf is not working.我试过 striprtf 但 read_rtf 不起作用。

from striprtf.striprtf import rtf_to_text
from striprtf.striprtf import read_rtf
rtf = read_rtf("file.rtf")
text = rtf_to_text(rtf)
print(text)

But in this code, the error is: cannot import name 'read_rtf'但是在这段代码中,错误是: cannot import name 'read_rtf'

Please can anyone suggest any way to get strings from .rtf file in python3?请有人建议从python3中的.rtf文件中获取字符串的任何方法吗?

Have you tried this?你试过这个吗?

with open('yourfile.rtf', 'r') as file:
    text = file.read()
print(text)

For a super large file, try this:对于超大文件,试试这个:

with open("yourfile.rtf") as infile:
    for line in infile:
        do_something_with(line)

Try using this:尝试使用这个:

from striprtf.striprtf import rtf_to_text

sample_text = "any text as a string you want"
text = rtf_to_text(sample_text)

Reading RTF file and manipulating the data inside that is tricky, it is depending upon the file you have, Hence I have tried all the above nothing worked, finally, the following code worked for me.读取 RTF 文件并操作其中的数据很棘手,这取决于您拥有的文件,因此我尝试了上述所有方法均无效,最后,以下代码对我有用。 Hope it will help those who are hunting for the solution.希望它能帮助那些正在寻找解决方案的人。

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\power.rtf' 
doc = word.Documents.Open(FileName=path, Encoding='gbk')
 
for para in doc.paragraphs:
    print(para.Range.Text)
 
doc.Close()
word.Quit()

If you want to store in a single variable, the following code will solve the problem.如果要存储在单个变量中,以下代码将解决问题。

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\output_5.rtf' # Write absolute path, relative path will dial wrong
doc = word.Documents.Open(FileName=path, Encoding='gbk')

#for para in doc.paragraphs:
#    print(para.Range.Text)


content = '\n'.join([para.Range.Text for para in doc.paragraphs])

print(content)

doc.Close()
word.Quit()

Using rtf_to_text is enought to convsert rtf into string in Python.使用 rtf_to_text 足以将 rtf 转换为 Python 中的字符串。 read content from a rtf file and feed it to the rtf_to_text从 rtf 文件中读取内容并将其提供给 rtf_to_text

from striprtf.striprtf import rtf_to_text
with open("yourfile.rtf") as infile:
    content = infile.read()
    text = rtf_to_text(content)
print(text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM