简体   繁体   English

Python html2text 添加随机\\n

[英]Python html2text adds random \n

When using the html2text python package to convert html to markdown it adds '\\n' to the text.当使用html2text python 包将 html 转换为Markdown 时,它会在文本中添加 '\\n'。 I also see this behaviour when trying the demo at http://www.aaronsw.com/2002/html2text/http://www.aaronsw.com/2002/html2text/尝试演示时,我也看到了这种行为

Is there any way to turn this off?有什么办法可以关闭这个功能吗? Of course I can remove them myself, but there might be occurrences of '\\n' in the original text which I don't want to remove.当然,我可以自己删除它们,但是原始文本中可能出现了我不想删除的 '\\n'。

    html2text('Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.')

    u'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod\ntempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\nquis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo\nconsequat. Duis aute irure dolor in reprehenderit in voluptate velit esse\ncillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non\nproident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n\n'

In the latest version of html2text do this:在最新版本的 html2text 中执行以下操作:

import html2text
h = html2text.HTML2Text()
h.body_width = 0
note = h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")

This removes the word wrapping that html2text otherwise does这将删除 html2text 否则的自动换行

Looking at the source to html2text.py , it looks like you can disable the wrapping behavior by setting BODY_WIDTH to 0 .查看html2text.py的源html2text.py ,看起来您可以通过将BODY_WIDTH设置为0来禁用包装行为。 Something like this:像这样的东西:

import html2text
html2text.BODY_WIDTH = 0
text = html2text.html2text('...')

Of course, resetting BODY_WIDTH globally changes the module's behavior.当然,全局重置BODY_WIDTH更改模块的行为。 If I had a need to access this functionality, I'd probably seek to patch the module, creating a parameter to html2text() to modify this behavior per-call, and provide this patch back to the author.如果我需要访问此功能,我可能会寻求修补该模块,为html2text()创建一个参数以在每次调用时修改此行为,并将此修补程序提供给作者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM