[英]Using html2text and clean some text in Python
I'm using Html2Text
to convert html code into a text. 我正在使用
Html2Text
将html代码转换为文本。 Works very well, but I can't find many examples or documentation on the internet. 效果很好,但是我在互联网上找不到很多示例或文档。
I'm reading users name in this way: 我正在以这种方式读取用户名:
text_to_gain = hxs.xpath('//div[contains(@id,"yq-question-detail-profile-img")]/a/img/@alt').extract()
if text_to_gain:
h = html2text.HTML2Text()
h.ignore_links = True
item['author'] = h.handle(text_to_gain[0])
else:
item['author'] = "anonymous"
But my output is this : 但是我的输出是这样的:
u'Duncan\n\n'
It's useful have the \\n when i read long text or message, but for single string or some one i want to keep only the name. 当我阅读长文本或消息时,使用\\ n很有用,但是对于单个字符串或某个字符串,我只想保留名称。
'Duncan'
you can do like this too, just remove the character '\\n': 您也可以这样做,只需删除字符'\\ n':
>>> st = 'Duncan\n\n'
>>> st.replace('\n', '')
'Duncan'
>>>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.