使用html2text并在Python中清除一些文本

Question

I'm using Html2Text to convert html code into a text. 我正在使用Html2Text将html代码转换为文本。 Works very well, but I can't find many examples or documentation on the internet. 效果很好，但是我在互联网上找不到很多示例或文档。

I'm reading users name in this way: 我正在以这种方式读取用户名：

text_to_gain = hxs.xpath('//div[contains(@id,"yq-question-detail-profile-img")]/a/img/@alt').extract()
if text_to_gain:
        h = html2text.HTML2Text()
        h.ignore_links = True
        item['author'] = h.handle(text_to_gain[0])
else:
        item['author'] = "anonymous"

But my output is this : 但是我的输出是这样的：

u'Duncan\n\n'

It's useful have the \\n when i read long text or message, but for single string or some one i want to keep only the name. 当我阅读长文本或消息时，使用\\ n很有用，但是对于单个字符串或某个字符串，我只想保留名称。

'Duncan'

Answer 1

Use strip() function. 使用strip()函数。 This will remove all the whitespaces. 这将删除所有空格。

>>> a = u'Duncan\n\n'
>>> a
u'Duncan\n\n'
>>> a.strip()
u'Duncan'
>>> str(a.strip())
'Duncan'

Answer 2

you can do like this too, just remove the character '\\n': 您也可以这样做，只需删除字符'\\ n'：

>>> st = 'Duncan\n\n'
>>> st.replace('\n', '')
'Duncan'
>>>

使用html2text并在Python中清除一些文本

问题描述

2 个解决方案

解决方案1
5 已采纳 2015-10-18 14:47:21

解决方案2
0 2015-10-18 15:15:46

使用html2text并在Python中清除一些文本

问题描述

2 个解决方案

解决方案1 5 已采纳 2015-10-18 14:47:21

解决方案2 0 2015-10-18 15:15:46

解决方案1
5 已采纳 2015-10-18 14:47:21

解决方案2
0 2015-10-18 15:15:46