I need to convert a word document into html code and then save it into a.txt file with lines of no longer than 100 characters (there's a process later on that won't pick up more than 255 characters if they're not in separate lines).
So far, I've successfully (though a better solution is welcome) managed to convert the.docx file into html and deploy that variable into a.txt file. However, I'm not able to figure out how to separate the lines. Is there any integrated function which could achieve this?
import mammoth
with open(r'C:\Users\uXXXXXX\Downloads\Test_Script.docx', "rb") as docx_file:
result = mammoth.convert_to_html(docx_file)
html = result.value # The generated HTML
messages = result.messages # Any messages, such as warnings during conversion
with open(r'C:\Users\uXXXXXX\Downloads\Output.txt', 'w') as text_file:
text_file.write(html)
In that case, you can just do
html = "..."
i = 100
while i < len(html):
html = html[:i] + "\n" + html[i:]
i += 101
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.