在python中匹配字符串模式

Question

我有一个可以包含链接的字符串：

<a href="http://site1.com/">Hello</a> <a href="http://site2.com/">Hello2</a>
<a href="http://site3.com">Hello3</a> ...

如何提取所有html标签“ Hello”，“ Hello2”，“ Hello3” ...的文本（而不是链接）？ 我在考虑应该包含所有文本的列表。

Answer 1

使用lxml ：

import lxml.html as LH

content = '''
<a href="http://site1.com/">Hello</a> <a href="http://site2.com/">Hello2</a>
<a href="http://site3.com">Hello3</a>
<a href="/">go <b>home</b>, dude!</a>
'''

doc = LH.fromstring(content)
texts = [elt.text_content() for elt in doc.xpath('//a')]
print(texts)

产量

['Hello', 'Hello2', 'Hello3', 'go home, dude!']

在python中匹配字符串模式

问题描述

1 个解决方案

解决方案1
1 已采纳 2012-11-16 22:50:58

在python中匹配字符串模式

问题描述

1 个解决方案

解决方案1 1 已采纳 2012-11-16 22:50:58

解决方案1
1 已采纳 2012-11-16 22:50:58