繁体   English   中英

如何读取.rtf文件并转换为python3字符串并可以存储在python3列表中?

[英]How to read .rtf file and convert into python3 strings and can be stored in python3 list?

我有一个 .rtf 文件,我想通过使用任何包使用 python3 读取文件并将字符串存储到列表中,但它应该与 Windows 和 Linux 兼容。

我试过 striprtf 但 read_rtf 不起作用。

from striprtf.striprtf import rtf_to_text
from striprtf.striprtf import read_rtf
rtf = read_rtf("file.rtf")
text = rtf_to_text(rtf)
print(text)

但是在这段代码中,错误是: cannot import name 'read_rtf'

请有人建议从python3中的.rtf文件中获取字符串的任何方法吗?

你试过这个吗?

with open('yourfile.rtf', 'r') as file:
    text = file.read()
print(text)

对于超大文件,试试这个:

with open("yourfile.rtf") as infile:
    for line in infile:
        do_something_with(line)

尝试使用这个:

from striprtf.striprtf import rtf_to_text

sample_text = "any text as a string you want"
text = rtf_to_text(sample_text)

读取 RTF 文件并操作其中的数据很棘手,这取决于您拥有的文件,因此我尝试了上述所有方法均无效,最后,以下代码对我有用。 希望它能帮助那些正在寻找解决方案的人。

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\power.rtf' 
doc = word.Documents.Open(FileName=path, Encoding='gbk')
 
for para in doc.paragraphs:
    print(para.Range.Text)
 
doc.Close()
word.Quit()

如果要存储在单个变量中,以下代码将解决问题。

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\output_5.rtf' # Write absolute path, relative path will dial wrong
doc = word.Documents.Open(FileName=path, Encoding='gbk')

#for para in doc.paragraphs:
#    print(para.Range.Text)


content = '\n'.join([para.Range.Text for para in doc.paragraphs])

print(content)

doc.Close()
word.Quit()

使用 rtf_to_text 足以将 rtf 转换为 Python 中的字符串。 从 rtf 文件中读取内容并将其提供给 rtf_to_text

from striprtf.striprtf import rtf_to_text
with open("yourfile.rtf") as infile:
    content = infile.read()
    text = rtf_to_text(content)
print(text)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM